摘要
针对大量Web安全漏洞数据难以分析的问题,采用关键词抽取技术TextRank提取漏洞关键词。首先,利用网络爬虫和正则表达式获取了Exploit-db网站上1999年至2015年间的安全漏洞数据,分析结果表明近6年内该网站上Web安全漏洞数量和质量在不断下降;然后,采用TextRank方法提取了每年的漏洞关键词,结果表明每年主要漏洞类型变化情况较小,注入漏洞是主要的安全漏洞,Word Press是存在漏洞最多的应用程序,PHP应用程序是出现漏洞最多的平台;最后,研究了导致Web安全漏洞数量不断减少的原因。
The TextRank which is the automatic key phrase extraction technology is applied for extracting the vulnerability keywords. Firstly, the security vulnerability data in Exploit-db be- tween 1999 and 2015 year is obtained by web crawler and regular expression. The results present that the quantity and quality of the web security vulnerability keeps falling down. Secondly, the vulnerability keywords every year are extracted by TextRank. The results present that the main vulnerabilities in six years change little, the injection vulnerability is the main vulnerability, WordPress has the most vulnerabilities, the most vulnerabilities appears in the application using PHP. Finally,the causes what result in the number of decreasing vulnerabilities are studied.
出处
《电子信息对抗技术》
2016年第5期52-56,共5页
Electronic Information Warfare Technology
关键词
网络空间安全
Web安全漏洞
关键词提取
cyberspace security
web security vulnerability
key phrase extraction
TextRank