摘要
源代码漏洞的自动检测是一个重要的研究课题。目前现有的解决方案大多是基于线性模型,依赖于源代码的文本信息而忽略了语法结构信息,从而造成了源代码语法和语义信息的丢失,同时也遗漏了许多漏洞特征。提出了一种基于结构表征的智能化漏洞检测系统Astor,致力于使用源代码的结构信息进行智能化漏洞检测,所考虑的结构信息是抽象语法树(Abstract Syntax Tree,AST)。首先,构建了一个从源代码转化而来且包含源码语法结构信息的数据集,提出使用深度优先遍历的机制获取AST的语法表征。最后,使用神经网络模型学习AST的语法表征。为了评估Astor的性能,对多个基于结构化数据和基于线性数据的漏洞检测系统进行比较,实验结果表明Astor能有效提升漏洞检测能力,降低漏报率和误报率。此外,还进一步总结出结构化模型更适用于长度大,信息量丰富的数据。
Automatic detection of source code vulnerability is an important research topic. However, most existing solutions are based on linear models. They rely on the text information of source code but ignore the grammatical structure information. This will cause the loss of source code syntax and semantic information, but also miss many vulnerability features. In this paper, an Abstract Syntax Tree(AST) based source code structured representation learning system is proposed to study the structured information of source code and detect the vulnerabilities, called Astor. First, we present a data set that is transformed from the source code and contains information about the syntax structure of the source code. In addition, we propose using a depth first information extraction scheme to obtain the syntax and semantic representation of AST. In Astor, the neural network based detection system is used to learn the representation of AST. In order to evaluate the Astor, we compare vulnerability detection systems based on structured data and linear data. The results show that Astor can achieve much fewer false negative and false positive than other approaches. In addition, this paper further concludes that the structured model is more suitable for data with rich semantic information.
作者
陈肇炫
邹德清
李珍
金海
CHEN Zhaoxuan;ZOU Deqing;LI Zhen;JIN Hai(National Engineering Research Center for Big Data Technology and System,Services Computing Technology and System Lab,Clusters and Grid Computing Lab,Big Data Security Engineering Research Center,Wuhan 430074,China;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China 3;School of Cyber Science and engineering,Huazhong University of Science and Technology,Wuhan 430074,China;Institute of Huazhong University of Science and Technology,Shenzhen 518000,China)
出处
《信息安全学报》
CSCD
2020年第4期1-13,共13页
Journal of Cyber Security
基金
国家自然科学基金项目(No.U1936211)
深圳市基础研究(学科布局)(No.JCYJ20170413114215614)
广东省省级科技计划项目(No.2017B010124001)
广东省重点领域研发计划项目(No.2019B010139001)的资助。
关键词
漏洞检测
结构表征
抽象语法树
神经网络
vulnerability detection
structured representation
abstract syntax tree
neural network