期刊文献+

SEMBeF:一种基于分片循环神经网络的敏感高效的恶意代码行为检测框架 被引量:4

SEMBeF: A Sensitive and Efficient Malware Behavior Detection Framework based on Sliced Recurrent Neural Network
下载PDF
导出
摘要 词向量和循环神经网络(Recurrent Neural Network,RNN)能够识别语义和时序信息,在自然语言识别方面中取得了巨大成功。同时,代码运行时产生的API调用序列也反映了代码的真实意图,因此我们将之应用于恶意代码识别中,期望在取得较高正确率的同时减少人工提取和分析代码特征工作。然而仍然存在三个问题:1)不少恶意代码故意通过随机混合调用敏感API和非敏感API破坏正常的上下文,对这两种API同等对待可能产生漏报;2)为尽可能全面收集代码行为,代码运行期间产生的API序列长度较长,这将导致RNN学习时间过长;3)经典RNN常用的softmax分类函数泛化能力不强,准确率有待提高。为了解决上述问题,本文提出了一种基于分片RNN(Sliced Recurrent Neural Network,SRNN)的敏感高效的恶意代码行为检测架构SEMBeF。在SEMBeF中,我们提出了一种安全敏感API权重增强的敏感词向量算法,使得代码表示结果既包含上下文信息又包含安全敏感权重信息;我们还提出了一种SGRU-SVM网络结构,通过并行计算大幅降低了因代码API调用序列过长引起的训练时间过长的问题,提高了检测正确率;最后针对样本平衡和网络模型超参数选择问题进行了优化,进一步提高了检测正确率。本文还实现了SEMBeF验证系统,实验表明,与其他基于经典词向量和RNN的深度学习方法以及常用的机器学习方法相比,SEMBeF不仅检测正确率最高,训练效率也得到了显著提升。其中,检测正确率和训练时间分别为99.40%和210分钟,与传统RNN相比,正确率提高了0.48%,训练时间下降了96.6%。 With word vector space model,Recurrent Neural Network(RNN)can identify semantic and temporal information,and has achieved great success in natural language recognition.Similarly,the sequence of API calls generated by the code runtime also reflects the real intention of the code.Therefore,we apply it to malicious code detection,expecting to achieve high accuracy while reducing the manual work of extraction and analysis of code features.However,there are still three problems:1)many malicious codes intentionally destroy the normal context by randomly mixing sensitive APIs and non-sensitive APIs;2)in order to collect code behavior as comprehensively as possible,the length of API sequence generated while code is running could be very long,which will lead to the long learning time;3)softmax classification is commonly used with classical RNN,and there’s still space for accuracy improvement.To solve the above problems,a Sensitive and Efficient Malware Behavior detection Framework(SEMBeF)based on Sliced Recurrent Neural Network(SRNN)is proposed in this paper.In SEMBeF,we propose a sensitive word vector space algorithm to enhance the weights of security-sensitive API,which makes the results of code representation contain both context information and security-sensitive weight information.We also propose a SGRU-SVM network structure,which greatly reduces the problem of long training time caused by long API sequence of code and improves the detection accuracy.Finally,SGRU-SVM optimization is proposed to solve the problem of sample balance and hyper-parameter selection,which further improves the detection accuracy.This paper also implements the SEMBeF PoC(Proof of Cocept)system.Experiments show that compared with other deep learning methods based on classical word vector space model,machine learning methods and other common deep neural network models,SEMBeF system not only has the highest detection accuracy,but also improves the training efficiency significantly.The detection accuracy and training time of SEMBeF are 99.40%and 210 minutes,respectively.Compared with traditional GRU model,the accuracy increased by 0.48%,and the training time is decreased by 96.6%.
作者 詹静 范雪 刘一帆 张茜 ZHAN Jing;FAN Xue;LIU Yifan;ZHANG Qian(School of Computer,Beijing University of Technology,Beijing 100124,China;Beijing Key Laboratory of Trusted Computing,Beijing 100124,China;National Engineering Laboratory of Key Technologies of Information Security Grade Protection,Beijing 100124,China)
出处 《信息安全学报》 CSCD 2019年第6期67-79,共13页 Journal of Cyber Security
基金 国家重点研发计划项目(No.2016YFB0800204) 国防科研试验信息安全实验室对外开放项目(No.2016XXAQ08) 国家高技术研究发展计划(No.2015AA016002)资助
关键词 恶意代码行为检测 API序列 敏感词向量模型 分片循环神经网络(Sliced Recurrent Neural Network SRNN) malware behavior detection API sequence sensitive word vector space model sliced recurrent neural network(SRNN)
  • 相关文献

参考文献5

二级参考文献20

  • 1董志强,肖新光,张栗伟.编码心理学分析病毒同源性[J].信息安全与通信保密,2005(8):55-59. 被引量:9
  • 2Chien E,Omurchu L,Falliere N. W32 Duqu:the precursor to the next stuxnet [ A ]. Proceedings of the 5th USENIX Work- shop on Large-Scale Exploits and Emergent Threats (I,F,F,T) [ C ]. Berkeley, CA: USENIX Association,2012.5 - 5.
  • 3Gostev A, Soumenkov I. Stuxnet/Duqu: The evolution of drivers [ OL ], http ://www. securelist, com/en/analysis/ 204792208/Stuxnet Duqu The Evolution of Drivers,2011.
  • 4Bencs TH B ,P K G,Butty N L,et al. Duqu:A Stuxnet-like malware found in the wild[ R]. CrySyS Lab Technical Re- port, 2011.
  • 5Bencs TH B, et al. The cousins of stuxnet: Duqu, flame, and gauss [ J ]. Future Intemet,2012,4 (4) :971 - 1003.
  • 6Butler G, Hope R A. Manage Your Mind: The Mental Fit- ness Guide[ M]. Oxford University Press,2007.
  • 7Burrows S, Uitdenbogerd A L, Turrin A. Comparing tech- niques for authorship attribution of source code [ J ]. Soft- ware:Practice and Experience ,2014,44( 1 ) : 1 - 32.
  • 8Lab K. Resource 207:Kaspersky Lab Research Proves that Stuxnet and Flame Developers are Connected [ OL ]. ht- tp ://www. kaspersky, corrdabout/news/vims/2012/Re- source 207 Kaspersky_Lab_Research_Proves_that_Stuxnet and_Flame_Developers are Connected,2012.
  • 9Moran N, Bennett J. Supply chain analysis:From quarter- master to sunshop[ J]. FireEye Labs,2013,11 : 1 - 39.
  • 10Krsul I, Spafford E H. Authorship analysis:Identifying the author of a program [ J ], Computers & Security, 1997,16 (3) :233 -257.

共引文献41

同被引文献27

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部