摘要
对于传统的恶意程序检测方法,将机器学习算法应用在未知恶意程序的检测方法进行研究。使用单一特征的机器学习算法无法充分发挥其数据处理能力,检测效果一般。使用两视图协同训练,对于一个未知样本两个分类器预测结果相反时处理不佳。因此,在机器学习的基础上,采用一种三视图协同训练算法,三个分类器对未知样本预测有分歧时,基于"少数服从多数"的思想进行"投票"决定,具有比较理想的效果。该方法对APK软件进行逆向分析和特征提取,选取权限申请特征、API调用序列特征和Op Code特征三个非重叠子视图,针对每个子视图甄选最优算法分别生成分类器。在此基础上,采用Co-training算法思想,对三个分类器协同训练,实现了在已知样本较少的情况下,三个单独分类器检测性能的同步提升。从安卓市场下载各类良性样本4 600个,从恶意软件样本分享网站Virus Share下载最新恶意样本4 360个,按照已标记样本数量从30到120个分为10组实验,对约1 800个样本进行分类测试,实验结果表明该检测方法具有更优的效果。
For the traditional detection method of malicious program,the machine learning algorithm is applied to the detection method of unknown malware.The machine learning algorithm with a single feature cannot give full play to its data processing ability,and the detection effect is general.The two view collaborative training is not well for two classifiers with unknown samples when the prediction results are opposite.Therefore,based on machine learning,we adopt a collaborative training algorithm based on three views.When three classifiers are divided into unknown samples,voting is decided based on the idea of“majority obeys the majority”.This method carries out reverse analysis and feature extraction for APK software.It selects three non-overlapping sub-views of permission application features,API calling sequence feature and OpCode feature,and generates classifiers for each sub view to select the best algorithm.Based on that,the Co-training algorithm is used to train three classifiers and achieve synchronous performance improvement of three individual classifiers under less known samples.We download more than 4 600 benign samples from the Android Market,and more than 4 360 latest malware samples from VirusShare,a malware samples sharing site.According to the number of labeled samples from 30 to 120,10 groups of experiments are conducted and about 1 800 samples are classified.The experiment shows that the detection method has a better effect.
作者
王全民
张帅帅
杨晶
WANG Quan-min;ZHANG Shuai-shuai;YANG Jing(Department of Informatics,Beijing University of Technology,Beijing 100124,China)
出处
《计算机技术与发展》
2019年第1期135-139,共5页
Computer Technology and Development
基金
国家自然科学基金(61272500)