期刊文献+

基于混合代码表示的源代码脆弱性检测

Source code vulnerability detection based on hybrid code representation
下载PDF
导出
摘要 软件脆弱性对网络与信息安全产生了极大的威胁,而脆弱性的根源在于软件源代码。因为现有的传统静态检测工具和基于深度学习的检测方法没有完整地表示代码特征,并且简单地使用词嵌入方法转换代码表示,所以检测结果准确率低,误报率高或漏报率高。因此,提出了一种基于混合代码表示的源代码脆弱性检测方法来解决代码表示不完整的问题,并提升检测性能。首先将源代码编译为中间表示(IR),并提取程序依赖图;然后基于数据流和控制流分析进行程序切片来得到结构化的特征,同时使用doc2vec嵌入节点语句得到非结构化的特征;接着使用图神经网络(GNN)对混合特征进行学习;最后使用训练好的GNN进行预测和分类。为了验证所提方法的有效性,在软件保证参考数据集(SARD)和真实世界数据集上进行了实验评估,检测结果的F1值分别达到了95.3%和89.6%。实验结果表明,所提方法有较好的脆弱性检测能力。 Software vulnerabilities pose a great threat to network and information security,and the root of vulnerabilities lies in software source code.Existing traditional static detection tools and deep learning based detection methods do not fully represent code features,and simply use word embedding method to transform code representation,so that their detection results have low accuracy and high false positive rate or high false negative rate.Therefore,a source code vulnerability detection method based on hybrid code representation was proposed to solve the problem of incomplete code representation and improve detection performance.Firstly,source code was compiled into Intermediate Representation(IR),and the program dependency graph was extracted.Then,structural features were obtained through program slicing based on data flow and control flow analysis.At the same time,unstructural features were obtained by embedding node statements using doc2vec.Next,Graph Neural Network(GNN) was used to learn the hybrid features.Finally,the trained GNN was used for prediction and classification.In order to verify the effectiveness of the proposed method,experimental evaluation was performed on Software Assurance Reference Dataset(SARD) and real-world datasets,and the F1 score of detection results reached 95.3% and 89.6% respectively.Experimental results show that the proposed method has good vulnerability detection ability.
作者 张琨 杨丰玉 钟发 曾广东 周世健 ZHANG Kun;YANG Fengyu;ZHONG Fa;ZENG Guangdong;ZHOU Shijian(School of Software,Nanchang Hangkong University,Nanchang Jiangxi 330063,China)
出处 《计算机应用》 CSCD 北大核心 2023年第8期2517-2526,共10页 journal of Computer Applications
基金 江西省自然科学基金资助项目(20212BAB212009)。
关键词 脆弱性检测 中间表示 表示学习 图神经网络 深度学习 vulnerability detection Intermediate Representation(IR) representation learning Graph Neural Network(GNN) deep learning
  • 相关文献

参考文献5

二级参考文献113

  • 1陈火旺,王戟,董威.高可信软件工程技术[J].电子学报,2003,31(z1):1933-1938. 被引量:115
  • 2吴世忠.信息安全漏洞分析回顾与展望[J].清华大学学报(自然科学版),2009(S2):2065-2072. 被引量:22
  • 3程绍银,蒋凡,林锦滨,唐艳武.基于有限回溯符号执行的软件疑似缺陷的自动验证[J].清华大学学报(自然科学版),2009(S2):2222-2227. 被引量:1
  • 4刘文伟,刘坚.一个重建GCC抽象语法树的方法[J].计算机工程与应用,2004,40(18):125-128. 被引量:7
  • 5Coen-Porisini A, de Paoli F..Software specialization via sym- bolic execution[ J ]. IEEE Transactions on Software Engineer- ing, 1991,17 ( 9 ) :884-899.
  • 6Coen- Porisini A, Denaro G, Ghezzi C, et al. Using symbolic execution for verifying safety-critical systems [ C ]//Proc. of 8th European Software Engineering Conf. and 9th ACM SIG-soFr Int. Symp. on Foundations of Software Engineering (ESEC/FSE). [s. 1. ] :Is. n. ] ,2001:142-151.
  • 7Koutsikas C, Malevris N. A unified symbolic execution system [ C]//ACS/IEEE International Conference on Computer Sys- tems and Applications. [s. 1. ] : [s. n. ] ,2001:466-469.
  • 8赵云山,宫云战.基于符号分析的静态缺陷检测技术研究[博士学位论文].北京:北京邮电大学,2012.
  • 9Tassey G. The economic impacts of inadequate infrastructure for software testing. Gaithershurg National. Institute of Standards and Technology, Planning Report 02-3, 2002.
  • 10Sipser M. Introduction to the Theory of Computation. Boston, USA: Thomson Course Technology, 2006.

共引文献138

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部