摘要
JavaScript是一种动态脚本语言,被用于提高网页的交互能力.然而攻击者利用它的动态性在网页中执行恶意代码,构成了巨大威胁.传统的基于静态特征的检测方式难以检测经过混淆后的恶意代码,而基于动态分析检测的方式存在效率低等问题.本文提出了一种基于语义分析的静态检测模型,通过提取抽象语法树的词法单元序列特征,使用word2vec训练词向量模型,将生成的序列向量特征输入到LSTM网络中检测恶意JavaScript脚本.实验结果表明,该模型能够高效检测混淆的恶意JavaScript代码,模型的精确率达99.94%,召回率为98.33%.
JavaScript is a dynamic scripting language originally designed to improve the interactive capability of web pages. However, attackers use this peculiarity to execute malicious code on web pages, posing a huge threat. The obfuscated malicious code is difficult to detect using the traditional method based on static features, and the method based on dynamic analysis is inefficient. This paper proposes a static detection model based on semantic analysis. Specifically, the lexical unit sequence is extracted from abstract syntax trees;then the word vectors are generated by word2vec based on the lexical unit sequence;finally the generated vectors are input into the LSTM network to detect malicious JavaScript. Experiments show that the model can effectively detect obfuscated malicious JavaScript code and improve the detection speed, with a precision of 99.94% and recall of 98.33%.
作者
邱瑶瑶
方勇
黄诚
刘亮
张星
QIU Yao-Yao;FANG Yong;HUANG Cheng;LIU Liang;ZHANG Xing(College of Electronics and Information Engineering, SichuanUniversity, Chengdu 610065, China;College of Cybersecurity, Sichuan University, Chengdu 610065, China;Nsfocus Information Technology Company, Limited, Beijing 100089, China)
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第2期273-278,共6页
Journal of Sichuan University(Natural Science Edition)
基金
CCF-绿盟科技"鲲鹏"基金(2018008)