期刊文献+

基于比较学习的漏洞检测方法

A Vulnerability Detection Approach Based on Comparative Learning
下载PDF
导出
摘要 当前基于深度学习的源代码漏洞检测是一种效率较高的漏洞分析方式,但其面临2个挑战:容量较大的数据集和有效的学习方式.针对这2个挑战做了2方面的研究工作:首先基于SARD数据集构建了样本容量为280793的多漏洞数据集,包含150种CWE漏洞类型.其次提出基于比较学习的深度学习方法.其核心思想是为深度学习训练集中每一个样本构建1个类型相同的样本集合,以及1个类型不相同的样本集合,形成一种比较学习的氛围.基于该思想创建的训练数据集,深度学习模型在训练的过程中,不但可以学习同类型样本大量的、细微的特征,还可以提取不同类型样本中区分性较强的特征.经过实验验证,基于所创建的数据集和提出的学习方法训练的深度学习模型可以识别150种CWE漏洞类型,准确率可以达到92.0%,平均PR值可以达到0.85,平均ROC-AUC值可以达到0.96.此外,也对基于深度学习的漏洞分析技术中普遍使用的代码符号化技术进行分析与讨论.实验表明,深度学习训练过程中,是否对代码进行符号化,并不会影响深度学习模型的漏洞识别准确率. At present,source code vulnerability detection based on deep learning is a highly efficient vulnerability analysis approach.But it faces two challenges:large data sets and effective learning approach.We have done some research work on these two challenges.Firstly,a multi-vulnerability dataset with a sample size of 280793 is constructed based on the SARD dataset,including 150 CWE vulnerabilities.Secondly,the deep learning approach based on comparative learning is proposed.Its core idea is to construct a sample set of the same type and a sample set of different types for each sample in the deep learning training set,forming a comparative learning atmosphere.Based on the training data set created by this idea,the deep learning model can not only learn a large number of more subtle features of the same type of samples,but also extract highly distinguishable features of different types of samples in the training process.Through experimental verification,the deep learning model trained based on the data set and the proposed learning approach in the paper can identify 150 CWE vulnerabilities with an accuracy of 92.0%,an average PR value of 0.84 and an average ROC-AUC value of 0.96.In addition,we also analyze and discuss the commonly used code symbolization technology in deep learning-based vulnerability analysis technology.Experiments show that,in the process of deep learning training,whether the code is symbolized or not will not affect the vulnerability identification accuracy of the deep learning model.
作者 陈小全 刘剑 夏翔宇 周绍翔 Chen Xiaoquan;Liu Jian;Xia Xiangyu;Zhou Shaoxiang(Department of Information,Beijing City University,Beijing 100191;CAS Key Laboratory of Network Assessment Technology(Institute of Information Engineering,Chinese Academy of Sciences),Beijing 100093;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049)
出处 《计算机研究与发展》 EI CSCD 北大核心 2023年第9期2152-2168,共17页 Journal of Computer Research and Development
基金 中国科学院信息工程研究所中国科学院网络测评技术重点实验室开放课题(KFKT2022-005) 中国科学院战略性先导科技专项(XDC02040100)。
关键词 漏洞检测 比较学习 深度学习 不平衡数据 模型检测 vulnerability detection comparative learning deep learning unbalanced data model checking
  • 相关文献

参考文献2

二级参考文献11

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部