摘要
恶意网页是一种新型的Web攻击手法,攻击者通常将一段恶意代码嵌入网页中,当用户访问该网页时,恶意代码会试图利用浏览器或其插件漏洞在后台隐秘地执行一系列恶意行为.针对恶意网页静态特征抽取问题,本文从已有的特征中选取了14个信息增益值较高的特征,并通过分析恶意网页的混淆手法提出了8个新的特征,共同组成了22维的静态特征体系.此外,针对已有特征抽取流程提出两点改进:对不同编码格式的原始网页进行预处理;回送Java Script脚本动态生成的的HTML代码,用以进一步抽取HTML相关特征.实验表明,在不均衡数据集和均衡数据集上,本文的特征体系具有一定的有效性.
Malicious Web pages is a new kind of Web-based attack method. In drive-by-download exploits, attackers embed malicious code into a Web page. When a victim visits this page, the code attempts to download and execute malwares by exploiting vulnerabilities in browser or its plugins. Considering the problem of extracting static feature from malicious Web page, this paper selects 14 static features based on information gain theory and proposes 8 new static features are proposed by analyzing obfuscated scripts. In addition, two improvements of original feature extraction process are proposed as follows: preprocessing for original Web page based on different code format; reprocessing HTML code which are dynamically generated by JavaScript to further extract HTML features. The experimental result shows that, on unbalanced data set and balanced data set, our static feature system is provided with a certain validity.
出处
《计算机系统应用》
2016年第7期213-218,共6页
Computer Systems & Applications