期刊文献+

融合文本分布式表示的重复缺陷报告检测 被引量:2

Duplicate bug report detection by combining distributed representations of documents
下载PDF
导出
摘要 重复缺陷报告检测能够避免对描述同一缺陷的多份报告进行重复的任务分派和修复,可降低软件维护成本。为了进一步提高检测的准确率,提出一种融合文本分布式表示的重复缺陷报告检测方法。首先,基于大规模缺陷报告数据库训练Doc2Vec模型并抽取缺陷报告的分布式表示,将不同长度的缺陷报告编码为统一长度的稠密向量。接着,通过比较这些向量来计算不同缺陷报告的相似程度,将其作为一种新特征与重复缺陷报告检测过程常用的其它特征进行融合,并利用机器学习算法训练二元分类模型。在公开的Bugzilla重复缺陷报告数据集上的实验结果表明,相比于代表性方法D_TS,本文方法的F1值平均提升了2%,说明了新特征的有效性。 Duplicate bug report detection can avoid the repeated assignment and repair processes for multiple bug reports that describe the same bug,and thus greatly reduce the cost of software main-tenance.To improve the accuracy of detection,this paper proposes a duplicate bug report detection method by combining distributed representations of documents.Firstly,the Doc2Vec model is trained based on a large-scale defect report database,the distributed representations of bug reports are extracted,and the variable-sized bug reports are encoded into fixed-sized dense vectors.Secondly,the similarities between different bug reports are calculated by comparing their dense vectors,it is as a new feature and combined with traditional features commonly used in the process of duplicate bug report detection,and machine learning algorithm is used to train the binary classification model.Experimental results on public duplicate bug report datasets from Bugzilla show that,compared with the state of the art method D_TS,our method improves the F1 value by 2%on average,which indicates the effectiveness of the new feature.
作者 曾杰 贲可荣 张献 徐永士 ZENG Jie;BEN Ke-rong;ZHANG Xian;XU Yong-shi(College of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China)
出处 《计算机工程与科学》 CSCD 北大核心 2021年第4期670-680,共11页 Computer Engineering & Science
关键词 重复缺陷报告 文本分布式表示 Doc2Vec模型 机器学习算法 duplicate bug report distributed representations of documents Doc2Vec model machine learning algorithm
  • 相关文献

参考文献4

二级参考文献11

  • 1Anvik J,Hiew L,Murphy C C.Coping with an Open Bug Repository.Proceedings of the 2005 OOPSLA Workshop on Eclipse Technology Exchange,2005,35-39.
  • 2Runeson P,Alexandersson M,Nyholm O.Detection of Duplicate Defect Reports Using Natural Language Processing.Proceedings of the 29th International Conference on Software Engineering,2007,499-510.
  • 3Wang X Y,Zhang L,Xie T,Anvik J,Sun J.An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information.Proceedings of the 30th International Conference on Software Engineering,2008,461-470.
  • 4Jalbert N,Weimer W.Automated Duplicate Detection for Bug Tracking System.Proceedings of the International Conference on Dependable Systems and Networks,2008,1-10.
  • 5Hiew L.Assisted Detection of Duplicate Bug Reports.Master Degree Dissertation University of British Columbia,Canada,2006.
  • 6Ko A J,Myers B A,Chau D H.A Linguistic Analysis of How People Describe Software Problems.Proceedings of the Visual Languages and HumaN-Centric Computing,2006,127-134.
  • 7Firefox Defect Repository.https://bugzilla.mozilla.org.
  • 8The Stanford Natural Language Processing Group.http://nlp.stanford.edu/software/lex-parser.shtml.
  • 9黄小亮,郁抒思,关佶红.基于LDA主题模型的软件缺陷分派方法[J].计算机工程,2011,37(21):46-48. 被引量:11
  • 10任永功,杨荣杰,尹明飞.基于特征权重与词间相关性的文本特征选择算法[J].计算机应用与软件,2012,29(9):33-36. 被引量:3

共引文献21

同被引文献16

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部