混淆恶意JavaScript代码的检测与反混淆方法研究被引量：18

Detecting and De-Obfuscating Obfuscated Malicious JavaScript Code

下载PDF

导出

摘要针对混淆恶意JavaScript代码很难被检测以及很难被反混淆的问题,深入分析了混淆JavaScript代码的外部静态行为特征和内部动态运行特征.提出一种检测混淆与反混淆方法,设计并实现了一个原型系统.系统通过静态分析检测混淆,通过动态分析进行反混淆.静态分析只使用正常行为数据进行训练,采用主成分分析(PCA)、单分类支持向量机(One Class SVM)和最近邻(K-NN)算法检测混淆.动态分析分为两个步骤:首先遍历混淆代码抽象语法树(Abstract Syntax Tree)的节点;其次根据节点类型跟踪并分析节点上的相关变量,利用相关的变量终值进行反混淆.从真实环境中收集了总数为80 574条JavaScript正常与混淆恶意代码用于测试.大量的实验结果表明,在选用主成分分析算法时,在误报率为0.1%时,系统对混淆恶意JavaScript代码的检测率能达到99.90%.与此同时,文中提出的反混淆方法对超过80%的混淆代码能进行有效反混淆. Obfuscated malicious JavaScript code is How to effectively and efficiently detect and de very difficult to be detected and to be de-obfuscated. obfuscate obfuscated malicious JavaScript code is thus an emerging and crucial issue. In order to dealing with the issue, in this paper, we analyze in-depth a big number of static outer and dynamic inner features of obfuscation, and accordingly extract effective static and dynamic features from obfuscation. A prototype system for the detection of obfuscation based on anomaly detection techniques and for the de-obfuscation based on variable analysis is designed, which combines static analysis and dynamic analysis of JavaScript codes. Static analysis is used mainly for the detection of obfuscated malicious JavaScript code while dynamic analysis is used for the de-obfuscation. In static analysis, only benign samples are used in training phase. Three machine learning algorithms are employed, namely, Principal Component Analysis （PCA）, One-Class Support Vector Machine （OCSVM） and K-Nearest Neighbor （K-NN）, to detect the obfuscation of malicious JavaScript code. In dynamic analysis, two steps are followed. Nodes of JavaScript Syntax Tree （AST） are first tracked and the related variable final values associated with the node types are then used to de-obfuscate. 80574 JavaScript-based pages are collected in a real network environment for validating our methods.Extensive experimental results demonstrate achieves a detection rate as 99.90% with a false positive rate as 0.1% for detecting obfuscation. Meanwhile, our de-obfuscation approach automatically de-obfuscates obfuscations with accuracy of more than 80%.

作者马洪亮王伟韩臻

机构地区北京交通大学计算机与信息技术学院石河子大学信息科学与技术学院

出处《计算机学报》 EI CSCD 北大核心 2017年第7期1699-1713,共15页 Chinese Journal of Computers

基金上海市信息安全综合管理技术研究重点实验室教育部高校创新团队项目(IRT201206) 教育部高等学校博士学科点专项科研基金(20120009110007 20120009120010) 教育部留学回国人员科研启动基金项目(K14C300020)资助博士点基金中央高校基本科研业务费专项资金(2015JBM025)

关键词混淆 WEB安全反混淆恶意网页异常检测 JAVASCRIPT obfuscation Web security de-obfuscation malicious Web page anomaly detection JavaScript

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1马洪亮,王伟,韩臻.基于JavaScript的轻量级恶意网页异常检测方法[J].华中科技大学学报（自然科学版）,2014,42(11):34-38. 被引量：8
2马洪亮,王伟,韩臻.面向drive-by-download攻击的检测方法[J].华中科技大学学报（自然科学版）,2016,44(3):6-11. 被引量：4

二级参考文献13

1Chih-Chung Chang,Chih-Jen Lin.LIBSVM[J]ACM Transactions on Intelligent Systems and Technology (TIST),2011(3).
2Cova M, Kruegel C, Vigna G. Detection and analysisof drive-by-download attacks and malicious javascript- code[C] // Proceedings of the 19th International Con- ference on World Wide Web. New York: ACM, 2010: 2812290.
3Microsoft Corporation. Microsoft security intelligence report: volumelT[R]. Redmond: Microsoft Corpora- tion, 2014.
4Sophos Corporation. Security threat report 2014[R]. Burlington: Sophos Corporation, 2014.
5Symantec Corporation. Internet security threat re- port: volume 19[R~. Mountain: Symantec Corpora tion, 2014.
6Wang Junjie, Xue Yinxing, Liu Yang, et al. JSDC: a hybrid approach for JavaScript malware detection and classification[C] // Proceedings of the 10th ACM Symposium on Information, Computer and Communi- cations Security. New York: ACM, 2015:109 120.
7Jayasinghe G K, Bertok P. Efficient and effective re- altime prediction of drive-by download attacks[J]. Journal of Network and Computer Applications, 2014, 38(2): 135-149.
8Rieck K, Krueger T, Dewald A. Cujo: efficient de- tection and prevention of drive-by-download attacks I-C]//Proceedings of the 26th Annual Computer Se- curity Applications Conference. New York.. ACM, 2010: 31-39.
9Zhang Haibo, Zuo Chaoshun, Guo Shanqing, et al. SafeBrowsingCloud: detecting drive-by-downloads at- tack using cloud computing environment[M~. Berlin: Springer, 2014: 292-303.
10Cao Yinzhi, Pan Xiang, Chen Yan, et al. JShield: towards real-time and vulnerability-based detection of polluted drive-by download attacks[C~ // Proceedings of the 30th Annual Computer Security Applications Conference. New Orleans: ACM, 2014: 466-475.

共引文献10

1马洪亮,王伟,韩臻.面向drive-by-download攻击的检测方法[J].华中科技大学学报（自然科学版）,2016,44(3):6-11. 被引量：4
2赖清楠,陈诗洋,马皓,张蓓.基于机器学习的批量网页篡改检测方法[J].华中科技大学学报（自然科学版）,2016,44(11):16-20. 被引量：3
3杨洪娇.基于机器学习的校园网恶意网页检测方法[J].信息与电脑,2016,28(11):175-176. 被引量：2
4张蕊,张桂发,郭记眀,蒋洪波.富属性异质信息网络的可约束异常检测[J].华中科技大学学报（自然科学版）,2017,45(12):26-31. 被引量：1
5刘鹏睿,宋礼鹏.针对恶意JavaScript识别的降维方法[J].计算机工程与应用,2018,54(21):20-24. 被引量：1
6魏旭,成卫青.基于特征融合和机器学习的恶意网页识别研究[J].南京邮电大学学报（自然科学版）,2019,39(5):95-104. 被引量：4
7姜文秀.Javascript在恶意网页异常检测中的应用研究[J].电脑知识与技术,2019,15(12):29-30.
8袁梁,林金芳.基于文档分层表示的恶意网页快速检测方法[J].计算机系统应用,2019,28(12):226-231. 被引量：1
9崔莹.Javascript代码混淆的检测和反混淆应用研究[J].集宁师范学院学报,2020,42(3):7-11. 被引量：2
10雷天翔,万良,于淼,褚堃.BiLSTM在JavaScript恶意代码检测中的应用[J].计算机系统应用,2021,30(8):266-273. 被引量：1

同被引文献60

1刘欣然.网络攻击分类技术综述[J].通信学报,2004,25(7):30-36. 被引量：36
2杨学兵,张俊.决策树算法及其核心技术[J].计算机技术与发展,2007,17(1):43-45. 被引量：84
3李洋,方滨兴,郭莉,田志宏.基于主动学习和TCM-KNN方法的有指导入侵检测技术[J].计算机学报,2007,30(8):1464-1473. 被引量：31
4袁征,冯雁,温巧燕,张华.构造一种新的混淆Java程序的不透明谓词[J].北京邮电大学学报,2007,30(6):103-106. 被引量：9
5马欢,徐进.JavaScript代码加密与解密方法剖析[J].科技创新导报,2008,5(22):18-18. 被引量：2
6李伟,苏璞睿,时云峰.基于空间向量计算的恶意文档检测技术[J].中国科学院研究生院学报,2010,27(2):267-274. 被引量：2
7张志华,牟永敏.基于函数调用的路径覆盖生成技术研究[J].电子学报,2010,38(8):1808-1811. 被引量：27
8李轶.基于JavaScript的面向对象程序设计研究[J].江汉大学学报（自然科学版）,2010,38(3):52-56. 被引量：7
9张福勇,齐德昱,胡镜林.基于C4.5决策树的嵌入型恶意代码检测方法[J].华南理工大学学报（自然科学版）,2011,39(5):68-72. 被引量：7
10刘波,王怀民,肖枫涛,陈新.面向异构网络环境的蠕虫传播模型Enhanced-AAWP[J].通信学报,2011,32(12):103-113. 被引量：2

引证文献18

1宋雪勦,张俊,何明星.基于动态符号执行的不透明谓词反混淆算法[J].西华大学学报（自然科学版）,2018,37(3):73-77.
2曲文鹏,赵连军,邓旭.混淆恶意JavaScript代码的多特征检测识别与分析[J].智能计算机与应用,2018,8(4):42-47. 被引量：1
3夏凡,崔艳荣.基于canvas的前端数据加密[J].电脑知识与技术,2018,14(12Z):33-34.
4刘鹏睿,宋礼鹏.针对恶意JavaScript识别的降维方法[J].计算机工程与应用,2018,54(21):20-24. 被引量：1
5黄一才,周伟伟,郁滨.目标成本值最优的物联网WSS蠕虫抑制算法[J].计算机研究与发展,2018,55(11):2467-2481. 被引量：2
6邱瑶瑶,方勇,黄诚,刘亮,张星.基于语义分析的恶意JavaScript代码检测方法[J].四川大学学报（自然科学版）,2019,56(2):273-278. 被引量：5
7王鹏,努尔布力,苏芮.恶意代码检测研究前沿与发展趋势的计量分析[J].计算机工程与应用,2019,55(8):92-101. 被引量：2
8周芷仪,陈婷,袁莹静,陈龙.基于JavaScript的电子时钟效果实现[J].软件,2019,40(3):60-64. 被引量：4
9史大伟,周季璇,徐良华.基于逻辑一致性判定的广义不透明谓词检测方法[J].计算机应用研究,2019,36(6):1808-1812. 被引量：1
10管衡,李麟俊,张琳.基于NE-VASVM的JavaScript恶意代码检测系统[J].南京邮电大学学报（自然科学版）,2019,39(3):82-90. 被引量：1

二级引证文献27

1张灿阳,刘晓洁.基于改进Simhash的虚拟机镜像去重方法[J].四川大学学报（自然科学版）,2020,57(1):57-65. 被引量：4
2谢宏兰.室内停车场车辆定位及导航系统的设计[J].电脑知识与技术,2020,16(12):219-220. 被引量：6
3崔莹.Javascript代码混淆的检测和反混淆应用研究[J].集宁师范学院学报,2020,42(3):7-11. 被引量：2
4王娟娟.基于List-Based转移的英语语义分析翻译系统研究[J].电子设计工程,2020,28(16):35-38. 被引量：1
5闫四洋,胡昌平,卞德志,左刚.基于SpringBoot+MongoDB的微服务日志系统的实现[J].计算机时代,2020(8):69-71. 被引量：7
6辛永强,郭芮君.一种焦度计强检辅助装置的设计[J].中国计量,2020(9):69-71.
7吕晓芳.Java Script代码分析技术探究[J].数码设计,2020,9(15):64-65.
8张小萍,黄海明.多策略网页恶意代码检测算法的实现[J].太原师范学院学报（自然科学版）,2021,20(1):73-76.
9褚云龙,谢丽荣,张小东,任景,刘鹏飞.面向智能电网的PCA近似法错误数据注入攻击[J].计算机与数字工程,2021,49(2):360-365. 被引量：4
10殷明.基于约束逻辑的网络非集中式信息流整合系统设计[J].现代电子技术,2021,44(10):82-86.

1陈文杰.计算机数据库技术在信息管理中的实际运用[J].电脑迷,2017,0(1):88-88. 被引量：2
2常利.增强现实技术在品牌营销传播中的发展研究[J].艺术工作,2017(4):82-85. 被引量：1
3白冰,钟必能,欧阳谷.融合分层卷积特征和尺度自适应核相关滤波器的目标跟踪[J].小型微型计算机系统,2017,38(9):2062-2066. 被引量：7

计算机学报

2017年第7期

浏览历史

内容加载中请稍等...

混淆恶意JavaScript代码的检测与反混淆方法研究被引量：18

参考文献2

二级参考文献13

共引文献10

同被引文献60

引证文献18

二级引证文献27

相关作者

相关机构

相关主题

浏览历史

混淆恶意JavaScript代码的检测与反混淆方法研究 被引量：18

参考文献2

二级参考文献13

共引文献10

同被引文献60

引证文献18

二级引证文献27

相关作者

相关机构

相关主题

浏览历史

混淆恶意JavaScript代码的检测与反混淆方法研究被引量：18