期刊文献+

一种新的基于N-gram模型的重复软件缺陷报告检测方法 被引量:2

A New and Better Method of Detecting Duplicate Defect Reports Using N-gram Method
下载PDF
导出
摘要 软件开发维护过程中产生的缺陷报告中常常出现大量的重复缺陷报告。自动准确地检测出重复缺陷报告,将为软件缺陷的分派、修正、再测试等工作节约大量宝贵的开发维护成本。文章基于传统的向量空间模型检测方法,提出一种新的基于N-gram模型的重复缺陷报告检测方法,文中第2小节中详细介绍了该方法的细节。通过在小数据集上的实验,明确了在使用该方法检测重复缺陷报告时,参数N取3/4/5,利用全句法仅针对缺陷报告的概要信息进行相似度计算将取得较好的效果。最终使用一个含有4 503条Firefox缺陷报告的数据集对该方法进行了验证。实验证明N-gram模型法与向量空间模型法相比,重复缺陷的查全率(Recall Rate)提高了25%~55%。 Aim.The introduction of the full paper points out what we believe to be the shortcomings of existing papers in the open literature.Hence we propose a new and better method.Subsection 1.2 briefs the N-gram model.Section 2 explains our new and better method of detecting duplicate defect reports using N-gram method.The titles of subsections 2.1,2.2,2.3,2.4,2.5,2.7 are respectively tokenization,word stemming,synonym replacement,stop word removal,N-gram similarity calculation and duplicate defect report detection accuracy measurement;in particular,Formula(6) in subsection 2.7 is very important for calculating the recall rate of our method.In section 3,we select the N-parameter,the complete-sentence syntax and the summary information on software defect report with a small subset of Firefox defect repository and evaluate our method with a large subset of Firefox defect repository including 4503 defect reports.The experimental results,presented in Figs.2 and 3,show preliminarily that the recall rate of our method increases by 25% to 55% compared with that of the traditional Vector Space Model method in detecting duplicate defect reports.
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2010年第2期298-303,共6页 Journal of Northwestern Polytechnical University
基金 国家自然科学基金(60970070)资助
关键词 自然语言处理系统 重复缺陷报告 N-gram方法 文本相似度 natural language processing systems duplicate defect report N-gram method N-gram similarity
  • 相关文献

参考文献8

  • 1Anvik J,Hiew L,Murphy C C.Coping with an Open Bug Repository.Proceedings of the 2005 OOPSLA Workshop on Eclipse Technology Exchange,2005,35-39.
  • 2Runeson P,Alexandersson M,Nyholm O.Detection of Duplicate Defect Reports Using Natural Language Processing.Proceedings of the 29th International Conference on Software Engineering,2007,499-510.
  • 3Wang X Y,Zhang L,Xie T,Anvik J,Sun J.An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information.Proceedings of the 30th International Conference on Software Engineering,2008,461-470.
  • 4Jalbert N,Weimer W.Automated Duplicate Detection for Bug Tracking System.Proceedings of the International Conference on Dependable Systems and Networks,2008,1-10.
  • 5Hiew L.Assisted Detection of Duplicate Bug Reports.Master Degree Dissertation University of British Columbia,Canada,2006.
  • 6Ko A J,Myers B A,Chau D H.A Linguistic Analysis of How People Describe Software Problems.Proceedings of the Visual Languages and HumaN-Centric Computing,2006,127-134.
  • 7Firefox Defect Repository.https://bugzilla.mozilla.org.
  • 8The Stanford Natural Language Processing Group.http://nlp.stanford.edu/software/lex-parser.shtml.

同被引文献17

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部