基于C4.5决策树的嵌入型恶意代码检测方法被引量：7

Detection of Embedded Malware Based on C4.5 Decision Tree

下载PDF

导出

摘要嵌入型恶意代码以其高隐蔽性和难检测性,成为计算机安全的新威胁.文中针对以往的统计分析法没有充分考虑嵌入型恶意代码所占字节数小、信息增益大的特点提出一种采用C4.5决策树的嵌入型恶意代码检测方法,即通过提取训练样本中信息增益最大的500个3-gram作为属性特征,建立决策树,实现对未知嵌入型恶意代码的检测.实验结果表明,文中方法在检测率和分类准确率上均具有明显优势,对感染了嵌入型恶意代码的Word文档的检测率达99.80%. Embedded malware has become a novel computer security threat due to its high concealment and poor detectability.However,the existing statistical analysis methods are ineffective because they do not fully consider the small number of malicious bytes and the high information gain of embedded malware.In order to solve this problem,a new detection method of embedded malware is proposed based on C4.5 decision tree,which implements the detection by establishing a decision tree with 500 high-information-gain 3-grams extracted from training samples as the attribute.Experimental results show that the proposed method is superior to the existing methods in terms of detection rate and classification accuracy,and that it may achieve a detection rate of 99.80% for infected Word.

作者张福勇齐德昱胡镜林

机构地区华南理工大学计算机系统研究所

出处《华南理工大学学报（自然科学版）》 EI CAS CSCD 北大核心 2011年第5期68-72,共5页 Journal of South China University of Technology(Natural Science Edition)

基金国家技术创新基金资助项目(08C26214411198) 粤港关键领域重点突破项目(2008A011400010)

关键词嵌入型恶意代码恶意代码检测 C4.5决策树 BOOSTING算法 embedded malware malware detection C4.5 decision tree Boosting algorithm

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献16

1Stolfo S J,Wang K,Li W J.Towards stealthy malware detection[M] // Malware detection.Heidelberg:SpringerVerlag,2007:231-249.
2Li W J,Stoffo S J,Stavrou A,et al.A study of malcodebearing documents[C] //Proceedings of the 4th International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Heidelberg:Springer-Verlng,2007:231-250.
3Shafiq M Z,Khayam S A,Farooq M.Embedded malware detection using Markov n-grams[C] //Proceedings of the 5th International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Heidelberg:Springer-Verlag,2008:88-107.
4John Leyden.Trojan exploits unpatched Word vulnerability[EB/OL].(2006-05-22)[2010-05-28].http://www.theregister.co.uk/2006/05/22/trojan_ exploit_word_vuln/.
5Joris Evers.Zero-day attacks continue to hit Microsoft[EB/OL].(2006-09-28)[2010-05-28].http://news.cnet.com/ Zero-day-attacks-continue-to-hit-Microsoft/2100-7349_3-6120481.html.
6David Kierznowski.Backdooring PDF files[EB/OL].(2006-09-13)[2010-05-28].http:// michaeldaw.org/md-hacks/backdooring-pdf-files.
7Damashek M.Gauging similarity with n-grams:language-independent categorization of text[J].Science,1995,267(5199):843-848.
8Grossman D A,Frieder O.Information retrieval:algorithms and heuristics[M].2nd ed.Heidelberg:Springer-Verlag,2004.
9Dumais S,Platt J,Heckerman D,et al.Inductive learning algorithms and representations for text categorization[C] // Proceedings of the 7th International Conference on Information and Knowledge Management.New York:ACM Press,1998:148-155.
10Kolter J Z,Maloof M A.Learning to detect malicious executables in the wild[C] // Proceedings of the International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2004:470-478.

二级参考文献27

1许孝元,韩国强,闵华清.预测型关联规则演化学习的适应值函数[J].华南理工大学学报（自然科学版）,2005,33(5):1-6. 被引量：3
2Moore AW, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: Proc. of the 2005 ACM SIGMETRICS Int'l Conf. on Measurement and Modeling of Computer Systems, Banff, 2005. 50-60. http://www.cl.cam.ac.uk/-awm22 /publications/moore2005internet.pdf.
3Madhukar A, Williamson C. A longitudinal study of P2P traffic classification. In: Proc. of the 14th IEEE Int'l Syrup. on Modeling, Analysis, and Simulation. Monterey, 2006. http://ieeexplore.ieee.org/xpl/ffeeabs_all.jsp?arnumber=1698549.
4Moore AW, Papagiannaki K. Toward the accurate identification of network applications. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005.41-54.
5Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. In: Proc. of the ACM SIGCOMM. Philadelphia, 2005. 229-240. http://conferences.sigcomm.org/sigcomm/2005/paper-KarPap.pdf.
6Roughan M, Sen S, Spatscheck O, Dutfield N. Class-of-Service mapping for QoS: A statistical signature-based approach to IP traffic classification. In: Proc. of the ACM SIGCOMM Internet Measurement Conf. Taormina, 2004. 135-148. http://www.imconf.net/imc-2004/papers/p 135-roughan.pdf.
7Zuev D, Moore AW. Traffic classification using a statistical approach. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005. 321-324.
8Nguyen T, Armitage G. Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks. In: Proc. of the 31 st IEEE LCN 2006. Tampa, 2006. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4116573.
9Eerman J, Mahanti A, Arlitt M. Internct traffic identification using machine learning techniques. In: Proc. of the 49th IEEE GLOBECOM. San Francisco, 2006. http://pages.cpsc.ucalgary.ca/-mahanti/papers/globecom06.pdf.
10Erman J, Arlitt M, Mahanti A. Traffic classification using clustering algorithms. In: Proc. of the ACM SIGCOMM Workshop on Mining Network Data (MineNet). Pisa, 2006. http://conferences.sigcomm.org/sigcomm/2006/papers/minenet-01.pdf.

共引文献168

1邓建国,张素兰,张继福,荀亚玲,刘爱琴.监督学习中的损失函数及应用研究[J].大数据,2020,6(1):60-80. 被引量：31
2代志康,吴秋新,程希明.一种基于ResNet的网络流量识别方法[J].北京信息科技大学学报（自然科学版）,2020,35(1):82-88. 被引量：5
3陈陆颖,丛蓉,杨洁,于华.高速网络环境下的P2P流媒体业务分析和识别方法(英文)[J].China Communications,2011,8(5):70-78. 被引量：1
4赵树鹏,陈贞翔,彭立志.基于流中前5个包的在线流量分类特征[J].济南大学学报（自然科学版）,2012,26(2):156-160. 被引量：3
5孟姣,王丽宏,熊刚,姚垚.基于机器学习的SSH应用分类研究[J].计算机研究与发展,2012,49(S2):153-159. 被引量：2
6胡婷,王勇,陶晓玲.网络流量分类方法的比较研究[J].桂林电子科技大学学报,2010,30(3):216-219. 被引量：4
7胡婷,王勇,陶晓玲.混合模式的网络流量分类方法[J].计算机应用,2010,30(10):2653-2655. 被引量：8
8易兴辉,王国胤,胡峰.一种新的基于粗糙集的动态样本识别算法[J].南京大学学报（自然科学版）,2010,46(5):501-506. 被引量：8
9刘浩力.多层次压缩决策树在计算机取证中的应用[J].中国信息界,2011(1):60-62.
10吴陈,林炎钟.C4.5算法在高校教师评价中的应用研究[J].信息技术,2011,35(1):133-136. 被引量：10

同被引文献53

1魏红宁.决策树剪枝方法的比较[J].西南交通大学学报,2005,40(1):44-48. 被引量：42
2张晓龙,骆名剑.基于IF-THEN规则的决策树裁剪算法[J].计算机应用,2005,25(9):1986-1988. 被引量：3
3樊建聪,张问银,梁永全.基于贝叶斯方法的决策树分类算法[J].计算机应用,2005,25(12):2882-2884. 被引量：20
4周顺先,陈浩文,池鹏.一种基于资源操作域的主机防护模型[J].计算机工程与应用,2006,42(5):152-155. 被引量：4
5Tang Yinggan Liu Dong Guan Xinping.Multi-resolution image segmentation based on Gaussian mixture model[J].Journal of Systems Engineering and Electronics,2006,17(4):870-874. 被引量：5
6李晓冬,李毅超.基于AEC的恶意代码检测系统的设计与实现[J].计算机应用,2007,27(6):1371-1373. 被引量：3
7SCHA33"EL J L, BUNGE R. The national weather service shares digital forecasts using Web services [ J]. Bulletin of the American Meteorological Society, 2008, 89(4) : 449 - 450.
8巩固,张虹.决策树算法在天气评估中的应用[J].微计算机信息,2007(34):245-247. 被引量：6
9Khan H,Mirza F,Khayam S A.Determining MaliciousExecutable Distinguishing Attributes and Low-complexityDetection[J].Journal of Computer Virology,2011,7(2):95-105.
10Kim D S,Park J S.Network-based Intrusion Detectionwith Support Vector Machines[C]//Proceedings ofInformation Networking Conference.Berlin,Germany:Springer,2003:747-756.

引证文献7

1陈亮.改进的贝叶斯网络模型在保险欺诈挖掘中的应用[J].河南城建学院学报,2012,21(1):50-53. 被引量：2
2唐慧强,杭丽娜,范海娟.基于C4.5决策树算法的天气预警系统的手机终端设计[J].计算机应用,2013,33(5):1467-1469. 被引量：9
3边根庆,龚培娇,邵必林.基于K-L散度的恶意代码模型聚类检测方法[J].计算机工程,2014,40(12):104-107. 被引量：1
4赵丽,齐兴斌,李雪梅,田涛.基于PTM潜在Dirichlet分配的少量标记样本文本分类[J].计算机应用研究,2015,32(5):1428-1432. 被引量：2
5张福勇,赵铁柱.采用路径IRP的Windows恶意进程检测方法[J].沈阳工业大学学报,2015,37(4):434-439. 被引量：5
6蒋传勇,姚立红,潘理.基于VMM的程序行为异常检测[J].信息安全与通信保密,2016,14(3):118-122. 被引量：1
7喻民,姜建国,李罡,刘超,黄伟庆,宋楠.恶意文档检测研究综述[J].信息安全学报,2021,6(3):54-76. 被引量：5

二级引证文献25

1李俊磊,滕少华,张巍.基于决策树组合分类器的气温预测[J].广东工业大学学报,2014,31(4):54-59. 被引量：4
2林源,李连友.中国商业财产保险欺诈损失度量实证研究[J].系统工程学报,2015,30(4):509-518. 被引量：4
3赵夫群.基于半监督学习的Web页面内容分类技术研究[J].现代电子技术,2016,39(1):108-112. 被引量：1
4周悦,张力心,郭威.基于GA优化人工免疫算法的结构故障诊断[J].沈阳工业大学学报,2016,38(3):293-297. 被引量：4
5任艳.微信息大数据粗糙集的近似约简[J].沈阳工业大学学报,2016,38(3):309-313. 被引量：4
6王东,孙彬.情绪波动方程下微信息推介演变模型[J].沈阳工业大学学报,2016,38(4):434-439. 被引量：2
7孙彬,王东.微信息舆情的主动介入导引模式[J].沈阳工业大学学报,2016,38(5):584-589. 被引量：2
8李楸桐.一种气象预警信息系统终端的网络设计[J].产业与科技论坛,2016,15(18):61-62.
9武静,张鑫.基于贝叶斯网络的景观设计途径的推送检验模式[J].湘潭大学自然科学学报,2016,38(3):99-104. 被引量：2
10陈妮.基于C4.5的企业运营资金流向分析研究[J].自动化与仪器仪表,2017(5):104-105. 被引量：1

华南理工大学学报（自然科学版）

2011年第5期

浏览历史

内容加载中请稍等...

基于C4.5决策树的嵌入型恶意代码检测方法被引量：7

参考文献16

二级参考文献27

共引文献168

同被引文献53

引证文献7

二级引证文献25

相关作者

相关机构

相关主题

浏览历史

基于C4.5决策树的嵌入型恶意代码检测方法 被引量：7

参考文献16

二级参考文献27

共引文献168

同被引文献53

引证文献7

二级引证文献25

相关作者

相关机构

相关主题

浏览历史

基于C4.5决策树的嵌入型恶意代码检测方法被引量：7