摘要
软件脆弱性对网络与信息安全产生了极大的威胁,而脆弱性的根源在于软件源代码。因为现有的传统静态检测工具和基于深度学习的检测方法没有完整地表示代码特征,并且简单地使用词嵌入方法转换代码表示,所以检测结果准确率低,误报率高或漏报率高。因此,提出了一种基于混合代码表示的源代码脆弱性检测方法来解决代码表示不完整的问题,并提升检测性能。首先将源代码编译为中间表示(IR),并提取程序依赖图;然后基于数据流和控制流分析进行程序切片来得到结构化的特征,同时使用doc2vec嵌入节点语句得到非结构化的特征;接着使用图神经网络(GNN)对混合特征进行学习;最后使用训练好的GNN进行预测和分类。为了验证所提方法的有效性,在软件保证参考数据集(SARD)和真实世界数据集上进行了实验评估,检测结果的F1值分别达到了95.3%和89.6%。实验结果表明,所提方法有较好的脆弱性检测能力。
Software vulnerabilities pose a great threat to network and information security,and the root of vulnerabilities lies in software source code.Existing traditional static detection tools and deep learning based detection methods do not fully represent code features,and simply use word embedding method to transform code representation,so that their detection results have low accuracy and high false positive rate or high false negative rate.Therefore,a source code vulnerability detection method based on hybrid code representation was proposed to solve the problem of incomplete code representation and improve detection performance.Firstly,source code was compiled into Intermediate Representation(IR),and the program dependency graph was extracted.Then,structural features were obtained through program slicing based on data flow and control flow analysis.At the same time,unstructural features were obtained by embedding node statements using doc2vec.Next,Graph Neural Network(GNN) was used to learn the hybrid features.Finally,the trained GNN was used for prediction and classification.In order to verify the effectiveness of the proposed method,experimental evaluation was performed on Software Assurance Reference Dataset(SARD) and real-world datasets,and the F1 score of detection results reached 95.3% and 89.6% respectively.Experimental results show that the proposed method has good vulnerability detection ability.
作者
张琨
杨丰玉
钟发
曾广东
周世健
ZHANG Kun;YANG Fengyu;ZHONG Fa;ZENG Guangdong;ZHOU Shijian(School of Software,Nanchang Hangkong University,Nanchang Jiangxi 330063,China)
出处
《计算机应用》
CSCD
北大核心
2023年第8期2517-2526,共10页
journal of Computer Applications
基金
江西省自然科学基金资助项目(20212BAB212009)。
关键词
脆弱性检测
中间表示
表示学习
图神经网络
深度学习
vulnerability detection
Intermediate Representation(IR)
representation learning
Graph Neural Network(GNN)
deep learning