期刊文献+

S-C特征提取的计算机漏洞自动分类算法 被引量:3

Automatic Classification of Computer Vulnerability Based on S-C Feature Extraction
下载PDF
导出
摘要 近年来未知的计算机漏洞数量呈海量增长状态,对于大量的漏洞数据进行及时准确的分析和分类管理,是十分重要且有待解决的问题。因此,提出一种基于信息熵与综合函数(S-C)特征提取,并利用关联了特征词集间相互关系的平均一阶依赖贝叶斯模型(AODE)分类器的分类方法对计算机漏洞描述信息进行文本分类。首先,利用S-C特征提取法提取特征词。通过结合词语的类间重要程度和类内重要程度的综合函数C,计算出词语对于类别的重要程度。再利用词语对于类别间的信息熵S,来弱化对于分类较为混乱的词语的重要程度,选取得到准确的特征词集。最后,利用关联了特征词集间相互关系的AODE对漏洞数据集进行分类。通过实验对比表明,S-C特征提取法能够提取准确的特征词集,并且结合AODE分类器的分类准确率要高于传统的分类器模型。 In recent years,the number of unknown computer vulnerabilities has increased rapidly.It is an important and unsolved problem for analyzing and classifying a large number of vulnerability data timely and accurately.Therefore,this paper proposes a text classification method for computer vulnerability description information based on information entropy and comprehensive function(S-C)feature extraction and combines the averaged onedependence estimators(AODE)classifier.First,the feature words are extracted by the S-C feature extraction method.By combining the comprehensive function C of the importance degree between classes and within classes of words,the importance degree of words to classes is calculated.Then,the information entropy S of words to classes is used to weaken the importance of words with chaotic classification and an accurate feature set is selected.Finally,the vulnerability data set is classified by using AODE which relates the relationship between feature word sets.The experimental comparison shows that the S-C feature extraction method can extract the accurate feature word set,and the classification accuracy combined with AODE classifier is higher than traditional classifier model.
作者 任家东 王倩 王菲 李亚洲 刘佳新 REN Jiadong;WANG Qian;WANG Fei;LI Yazhou;LIU Jiaxin(College of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066001,China;Computer Virtual Technology and System Integration Laboratory of Hebei Province,Qinhuangdao,Hebei 066001,China)
出处 《计算机科学与探索》 CSCD 北大核心 2020年第7期1173-1182,共10页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61472341,61772449,61572420,61807028,61802332。
关键词 计算机漏洞 文本分类 特征提取 信息熵 computer vulnerability text classification feature extraction information entropy
  • 相关文献

同被引文献25

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部