基于知识图谱的跨项目安全缺陷报告预测方法

Cross-project Prediction Method of Security Bug Reports Based on Knowledge Graph

下载PDF

导出

摘要安全缺陷报告可以描述软件产品中的安全关键漏洞.为了消除软件产品的安全攻击风险,安全缺陷报告(security bug report,SBR)预测越来越受到研究人员的关注.但在实际软件开发场景中,需要进行软件安全漏洞预测的项目可能是来自新公司或属于新启动的项目,没有足够的已标记安全缺陷报告供在实践中构建此软件安全漏洞预测模型.一种简单的解决方案就是使用迁移模型,即利用其他项目已经标记过的数据来构建预测模型.受到该领域最近的两项研究工作的启发,以安全关键字过滤为思路提出一种融合知识图谱的跨项目安全缺陷报告预测方法KG-SBRP(knowledge graph of security bug report prediction).使用安全缺陷报告中的文本信息域结合CWE(common weakness enumeration)与CVE Details(common vulnerabilities and exposures)共同构建三元组规则实体,以三元组规则实体构建安全漏洞知识图谱,在图谱中结合实体及其关系识别安全缺陷报告.将数据分为训练集和测试集进行模型拟合和性能评估.所构建的模型在7个不同规模的安全缺陷报告数据集上展开实证研究,研究结果表明,所提方法与当前主流方法FARSEC和Keyword matrix相比,在跨项目安全缺陷报告预测场景下,性能指标F1-score值可以平均提高11%,除此之外,在项目内安全缺陷报告预测场景下,F1-score值同样可以平均提高30%. Security bug reports(SBRs)can describe critical security vulnerabilities in software products.SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products.However,in actual software development scenarios,a new company or new project may need software security bug prediction,without enough marked SBRs for building SBR prediction models in practice.A simple solution is employing the migration model,which means that marked data of other projects can be adopted to build the prediction model.Inspired by two recent studies in this field,this study puts forward a cross-project SBR prediction method integrating knowledge graphs,i.e.,knowledge graph of security bug report prediction(KG-SBRP),based on the idea of security keyword filtering.The text information field in SBR is combined with common weakness enumeration(CWE)and common vulnerabilities and exposures(CVE)Details to build a triple rule entity.Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition.Finally,the data is divided into training sets and test sets for model fitting and performance evaluation.The built model conducts empirical research on seven SBR datasets with different scales.The results show that compared with the current main methods FARSEC and Keyword matrix,the proposed method can increase the performance index F1-score by an average of 11%under cross-project SBR prediction scenarios.In addition,the F1-score value can also grow by an average of 30%in SBR prediction scenarios within a project.

作者郑炜刘程远吴潇雪陈翔成婧源孙小兵孙瑞阳 ZHENG Wei;LIU Cheng-Yuan;WU Xiao-Xue;CHEN Xiang;CHENG Jing-Yuan;SUN Xiao-Bing;SUN Rui-Yang(School of Software,Northwestern Polytechnical University,Xi’an 710072,China;College of Information Engineering,Yangzhou University,Yangzhou 225127,China;School of Information Science and Technology,Nantong University,Nantong 226019,China;National Engineering Laboratory for Integrated Aero-space-ground-ocean Big Data Application Technology(Northwestern Polytechnical University),Xi’an 710072,China;Key Laboratory of Big Data Storage and Management(Northwestern Polytechnical University),Ministry of Industry and Information Technology,Xi’an 710172,China;State Key Laboratory of Information Security(Institute of Information Engineering,Chinese Academy of Sciences),Beijing 100093,China)