基于BERT模型的源代码漏洞检测技术研究

Research on Source Code Vulnerability Detection Based on BERT Model

下载PDF

导出

摘要源代码漏洞检测常使用代码指标、机器学习和深度学习等技术.但是这些技术存在无法保留源代码中的句法和语义信息、需要大量专家知识对漏洞特征进行定义等问题.为应对现有技术存在的问题,提出基于BERT(bidirectional encoder representations from transformers)模型的源代码漏洞检测模型.该模型将需要检测的源代码分割为多个小样本,将每个小样本转换成近似自然语言的形式,通过BERT模型实现源代码中漏洞特征的自动提取,然后训练具有良好性能的漏洞分类器,实现Python语言多种类型漏洞的检测.该模型在不同类型的漏洞中实现了平均99.2%的准确率、97.2%的精确率、96.2%的召回率和96.7%的F1分数的检测水平,对比现有的漏洞检测方法有2%~14%的性能提升.实验结果表明,该模型是一种通用的、轻量级的、可扩展的漏洞检测方法. Techniques such as code metrics,machine learning,and deep learning are commonly employed in source code vulnerability detection.However,these techniques have problems,such as their inability to retain the syntactic and semantic information of the source code and the requirement of extensive expert knowledge to define vulnerability features.To cope with the problems of existing techniques,this paper proposed a source code vulnerability detection model based on BERT(bidirectional encoder representations from transformers)model.The model splits the source code to be detected into multiple small samples,converted each small sample into the form of approximate natural language,realized the automatic extraction of vulnerability features in the source code through the BERT model,and then trained a vulnerability classifier with good performance to realize the detection of multiple types of vulnerabilities in Python language.The model achieved an average detection accuracy of 99.2%,precision of 97.2%,recall of 96.2%,and an F1 score of 96.7%across various vulnerability types.This represents a performance improvement of 2%to 14%over existing vulnerability detection methods.The experimental results showed that the model was a general,lightweight and scalable vulnerability detection method.

作者罗乐琦张艳硕王志强文津薛培阳 Luo Leqi;Zhang Yanshuo;Wang Zhiqiang;Wen Jin;Xue Peiyang(Beijing Electronic Science and Technology Institute,Beijing 100070)

机构地区北京电子科技学院

出处《信息安全研究》 CSCD 北大核心 2024年第4期294-301,共8页 Journal of Information Security Research

基金中国博士后科学基金面上项目(2019M650606) 中央高校基本科研业务费专项资金项目(328202203,20230045Z0114) 北京电子科技学院一流学科建设项目(3201012)。

关键词漏洞检测深度学习 PYTHON语言 BERT模型自然语言处理 vulnerability detection deep learning Python language BERT model natural language processing

分类号 TP393.08 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1陈传涛,潘丽敏,龚俊,马勇,罗森林.基于抽象语法树压缩编码的漏洞检测方法[J].信息安全研究,2022,8(1):35-42. 被引量：3

二级参考文献1

1孙伟,陈林.基于抽象语法树的C#源代码SQL注入漏洞检测算法[J].信息安全研究,2015,1(2):112-125. 被引量：4

共引文献2

1杜巧玲,罗永.铁路信号联锁故障诊断模型构建及仿真[J].自动化与仪器仪表,2022(4):38-43.
2李汇来,杨斌,于秀丽,唐晓梅.软件缺陷预测模型可解释性对比[J].计算机科学,2023,50(5):21-30. 被引量：4

1冯青文,王丹辉,张德贤.LSTM-SVM算法下软件潜在溢出漏洞检测仿真[J].计算机仿真,2024,41(2):487-491.
2凌翰钦,贾鹏,周安民.IFuzz:对整数类型漏洞的灰盒模糊测试[J].网络安全技术与应用,2023(12):36-39.
3耿辰,常舒予,黄海平.零样本场景下基于提示工程的智能合约漏洞检测研究[J].信息对抗技术,2024,3(2):70-81.
4王泽旭,文斌.关键路径预搜索的符号执行智能合约漏洞检测[J].应用科学学报,2024,42(2):364-374. 被引量：2
5李浩,古金宇,夏虞斌,臧斌宇,陈海波.基于PKS硬件特性的eBPF内存隔离机制[J].软件学报,2023,34(12):5921-5939.
6操方涛,傅建明,李子川.一种基于Unicorn的UEFI DXE驱动模拟执行方法[J].武汉大学学报（理学版）,2023,69(6):690-698.
7徐同同,刘逵,夏鑫.漏洞自动修复研究综述[J].软件学报,2024,35(1):136-158.
8江荣,刘海天,刘聪.基于集成学习的无监督网络入侵检测方法[J].信息网络安全,2024(3):411-426.
9王国峰,唐云善,徐立飞.基于污点分析的SQL注入漏洞检测[J].信息技术,2024,48(2):185-190. 被引量：1
10王秋云.图书馆信息安全风险及应对措施研究--基于系统安全漏洞分析[J].图书馆界,2024(1):67-74.

信息安全研究

2024年第4期

浏览历史

内容加载中请稍等...

基于BERT模型的源代码漏洞检测技术研究

参考文献1

二级参考文献1

共引文献2

相关作者

相关机构

相关主题

浏览历史