摘要
针对传统海量恶意代码分析方法中自动特征提取能力不足以及家族判定时效性差等问题,通过动静态方法对大量样本行为构成和代码片段分布规律的研究,提出了基于特征聚类的海量恶意代码在线自动分析模型,包括基于API行为和代码片段的特征空间构建方法、自动特征提取算法和基于LSH的近邻聚类算法。实验结果表明该模型具有大规模样本自动特征提取、支持在线数据聚类、家族判定准确率高等优势,依据该模型设计的原型系统实用性较强。
In order to improve the effectiveness and efficiency of mass malicious code analysis, an online analytical model was proposed including feature space construction, automatic feature extraction and fast clustering. Our research focused on the law of malware behavior and code string distribution by dynamic and static techniques. In this model, a sample was described with its API and key code fragment. This model proposed a fast clustering approach to identify group samples that exhibit similar feature when applied this model to real-world malware collections. The result demon- strates that the proposed model is able to extract feature automatically, support streaming data clustering on large-scale, and achieve better precision.
出处
《通信学报》
EI
CSCD
北大核心
2013年第8期146-153,共8页
Journal on Communications
基金
国家高技术研究发展计划("863"计划)基金资助项目(2013AA014700)
国家科技支撑计划基金资助项目(2012BAH46B02)
中国科学院战略性科技先导专项基金资助项目(XDA06030200)~~