摘要
漏洞相似性研究有助于安全研究人员从历史漏洞的信息中寻找新漏洞的解决方法。现有漏洞相似性研究工作开展不多,模型的选择也缺乏客观的实验数据支撑。文章将多种词嵌入技术与深度学习自编码器进行组合,从漏洞描述文本角度计算语义相似性。同时,结合从NVD等公共数据库提取的多维度特征数据,从漏洞特征角度计算漏洞特征相似性,并设计了一套基于NLP及特征融合的双角度漏洞相似性度量算法和评估方案。实验从数值分布、相似区分度和准确性等方面评估各种模型组合的效果,最优的模型组合在漏洞相似性判定中最高可获得0.927的F1分数。
The study of vulnerability similarity helps security researchers to find solutions to new vulnerabilities from historical vulnerability information.The existing work on vulnerability similarity is not much,and the selection of its model is also lack of objective experimental data support.On this basis,this paper combined various word embedding technologies and deep learning auto-encoders to calculate semantic similarity from the perspective of vulnerability description text.At the same time,multi-dimensional feature data were extracted from public databases such as NVD,to calculate vulnerability feature similarity from the perspective of vulnerability features,and finally a dual angle vulnerability similarity measurement algorithm and evaluation scheme based on NLP and feature fusion was designed.Based on objective experimental analysis,the effects of various model combinations were compared from the aspects of numerical distribution,similarity discrimination,accuracy,etc.The final optimized model combination can obtain the highest F1 score of 0.927 in the determination of vulnerability similarity.
作者
贾凡
康舒雅
江为强
王光涛
JIA Fan;KANG Shuya;JIANG Weiqiang;WANG Guangtao(School of Electronic and Information Engineering,Beijing Jiaotong University,Beijing 100044,China;Information Security Center,China Mobile Group Co.,Ltd.,Beijing 100053,China)
出处
《信息网络安全》
CSCD
北大核心
2023年第1期18-27,共10页
Netinfo Security
基金
教育部中国移动科研基金[MCM20200106]。
关键词
自然语言处理
深度学习
漏洞相似性
词嵌入
natural language processing
deep learning
vulnerability similarity
word embedding