摘要
在当前复杂网络环境下,恶意代码通过各种方式快速传播,入侵用户终端设备或网络设备、非法窃取用户隐私数据,对网络和互联网用户造成了严重的安全威胁.传统检测方法难以检测未知恶意代码,而恶意代码变体的多样性和庞大数量也对未知恶意代码检测构成了巨大挑战.提出了一种无监督的恶意代码识别方法,通过分析反汇编PE文件给出汇编指令标准化规则,结合潜在狄立克雷分布(latent Dirichlet allocation,LDA)获得汇编指令中潜在的“文档主题”、“主题词”的分布.再以“主题分布”构造恶意样本特征,产生一个全新的恶意代码检测框架.结合“困惑度”和变化的步长给出了最优“主题”数目的快速评价和自动确定方法,解决了LDA模型中主题数目需要预先指定的问题.同时解析了“文档主题”、“主题词”聚集结果的语义可解释性,说明了该方法获得的样本特征具有潜在的语义.实验结果表明:与其他方法相比该方法具有相当的或更好的恶意代码鉴别能力,同时能够准确地识别恶意代码的新变体.
In the current complex network environment,malicious codes have been spread quickly in various ways,which illegally occupy user terminal equipment or network equipment and illegally steal privacy data.Malware poses a serious security threat to network and Internet users.Traditional methods can t detect unknown malicious codes which is challenged by the diversity and large number of malicious code variants.We propose an unsupervised malware identification approach that generates a standardization rule of assembly instructions by analyzing the content of the decompiled PE files.By introducing latent Dirichlet allocation(LDA),our method extracts the latent“document-topic”and“topic-word”probability allocation from samples.The topic probability distributions are extracted as features of samples,which is a new way for malware feature presentation.Then,we propose a new malware detecting framework to train model and test malware.What s more,our method solves the problem that the topic number in LDA model needs to be specified beforehand using the perplexity and different steps,which evaluates the best numbers of“topics”quickly and automatically.Finally,it analyzes the semantics of“document-topic”and“topic-word”aggregating results in assembly instructions,which explains the latent semantics of features obtained by our method.Experimental results show that our method is more discriminative,which has better classification results than other methods,while providing accurate discrimination of the new novel malware variants.
作者
刘亚姝
王志海
侯跃然
严寒冰
Liu Yashu;Wang Zhihai;Hou Yueran;Yan Hanbing(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044;School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044;Institute of Network Technology,Beijing University of Posts and Telecommunications,Beijing 100876;National Computer Network Emergency Response Technical Team Coordination Center of China,Beijing 100029)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2019年第11期2339-2348,共10页
Journal of Computer Research and Development
基金
国家重点研发计划项目(2018YFB0803604,2018YFB0804704)
国家自然科学基金项目(U1736218,61672086)~~