摘要
随着软件的复杂程度越来越高,对漏洞检测的研究需求也日益增大.软件漏洞的迅速发现和修补,可以将漏洞带来的损失降到最低.基于深度学习的漏洞检测方法作为目前新兴的检测手段,可以从漏洞代码中自动学习其隐含的漏洞模式,节省了大量人力投入.但基于深度学习的漏洞检测方法尚未完善,其中,函数级别的检测方法存在检测粒度较粗且检测准确率较低的问题,切片级别的检测方法虽然能够有效减少样本噪声,但仍存在以下两方面的问题:一方面,现有方法大多采用人工漏洞数据集进行实验,因此其在真实环境中的漏洞检测能力仍然存疑;另一方面,相关工作仅致力于检测出切片样本是否存在漏洞,而缺乏对检测结果可解释性的考虑.针对上述问题,提出基于图神经网络的切片级漏洞检测及解释方法.该方法首先对C/C++源代码进行规范化并提取切片,以减少样本冗余信息干扰;之后,采用图神经网络模型进行切片嵌入得到其向量表征,以保留源代码的结构信息和漏洞特征;然后,将切片的向量表征输入漏洞检测模型进行训练和预测;最后,将训练完成的漏洞检测模型和待解释的漏洞切片输入漏洞解释器,得到具体的漏洞代码行.实验结果显示:在漏洞检测方面,该方法对于真实漏洞数据的检测F1分数达到75.1%,相较于对比方法提升了41.2%-110.4%;在漏洞解释方面,该方法在限定前10%的关键节点时,准确率可达73.6%,相较于两种对比解释器分别提升了8.9%和24.9%,且时间开销分别缩短了42.5%和15.4%.最后,该方法正确检测并解释了4个开源软件中59个真实漏洞,证明了其在现实世界漏洞发掘方面的实用性.
As software becomes more complex,the need for research on vulnerability detection is increasing.The rapid discovery and patching of software vulnerabilities is able to minimize the damage caused by vulnerabilities.As an emerging detection method,deep learning-based vulnerability detection methods can learn from the vulnerability code and automatically generate its implied vulnerability pattern,saving a lot of human effort.However,deep learning-based vulnerability detection methods are not yet perfect;function-level detection methods have a coarse detection granularity with low detection accuracy;slice-level detection methods can effectively reduce sample noise,but there are still the following two aspects of the problem:On the one hand,most of the existing methods use artificial vulnerability datasets for experiments,and the ability to detect vulnerabilities in real environments is still in doubt;on the other hand,the work is only dedicated to detecting the existence of vulnerabilities in the slice samples and the lack of interpretability of the detection results.To address above issues,this study proposes a slice-level vulnerability detection and interpretation method based on the graph neural network.The method first normalizes the C/C++source code and extracts slices to reduce the interference of redundant information in the samples;secondly,a graph neural network model is used to embed the slices to obtain their vector representations to preserve the structural information and vulnerability features of the source code;then the vector representations of slices are fed into the vulnerability detection model for training and prediction;finally,the trained vulnerability detection model and the vulnerability slices to be explained are fed into the vulnerability interpreter to obtain the specific lines of vulnerability code.The experimental results show that in terms of vulnerability detection,the method achieves an F1 score of 75.1%for real-world vulnerability,which is 41.2%-110.4%higher than the comparative methods.In terms of vulnerability interpretation,the method can reach 73.6%accuracy when limiting the top 10%of critical nodes,which is 8.9%and 24.9%higher than the other two interpreters,and the time overhead is reduced by 42.5%and 15.4%,respectively.Finally,this method correctly detects and explains 59 real vulnerabilities in the four open-source software,proving its practicality in real-world vulnerability discovery.
作者
胡雨涛
王溯远
吴月明
邹德清
李文科
金海
HU Yu-Tao;WANG Su-Yuan;WU Yue-Ming;ZOU De-Qing;LI Wen-Ke;JIN Hai(National Engineering Research Center for Big Data Technology and System(Key Laboratory of Services Computing Technology and System,Ministry of Education,Huazhong University of Science and Technology),Wuhan 430074,China;Hubei Key Laboratory of Distributed System Security,Wuhan 430074,China;School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Engineering,Huazhong University of Science and Technology,Wuhan 430074,China)
出处
《软件学报》
EI
CSCD
北大核心
2023年第6期2543-2561,共19页
Journal of Software
基金
国家自然科学基金(62172168)
湖北省重点研发计划(2021BAA032)。
关键词
漏洞检测
深度学习
图神经网络
人工智能可解释性
vulnerability detection
deep learning
graph neural network(GNN)
explainable AI