摘要
针对传统的JavaScript恶意代码静态检测所存在的样本标记工作量大,以及由于样本冗余度高、泛化能力不足所导致的分类精确度低的问题,提出了一种新的支持向量机的自主学习策略VASVM,通过价值度量的定义优化了最有价值样本的选择策略,同时结合迭代地调整训练集平衡度,提高了训练集泛化能力和训练过程的收敛速度。然后在此基础上利用NE-SVM算法对采用VASVM所选择的训练集进行剪裁以降低样本冗余度,并且进一步提高了泛化能力。最后得到了基于VASVM策略和NE-SVM算法所结合形成的NE-VASVM系统。实验结果表明,基于NE-VASVM的JavaScript恶意代码检测系统有效减少了人工标记工作量,提高了分类器精度。
Aiming at the problem that the traditional JavaScript malicious code static detection has a large workload of sample tags,and because of high sample redundancy,insufficient generalization ability and low classification accuracy,a new support vector machine is proposed.The self-learning strategy VASVM optimizes the selection strategy of the most valuable samples through the definition of value metrics.At the same time,it adjusts the balance of the training set iteratively,which improves the generalization ability of the training set and the convergence speed of the training process.Then based on this,the NE-SVM algorithm is used to tailor the training set selected by VASVM to reduce the sample redundancy and further improve the generalization ability.Finally,the NE-VASVM system based on VASVM strategy and NE-SVM algorithm is obtained.The experimental results show that the JavaScript malicious code detection system based on NE-VASVM effectively reduces the workload of manual marking and improves the accuracy of the classifier.
作者
管衡
李麟俊
张琳
GUAN Heng;LI Linjun;ZHANG Lin(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Department Traffic Police Coqjs,Jiangsu Provincial Public Security,Nanjing 210049,China)
出处
《南京邮电大学学报(自然科学版)》
北大核心
2019年第3期82-90,共9页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(61402241)
江苏省高校自然科学研究项目(17KJB520026)
南京邮电大学校级科研基金(NY217050)资助项目
关键词
支持向量机
主动学习
价值度量
训练集剪裁
support vector machines(SVM)
active learning
value measure
training set tailoring