期刊文献+

云环境下软件错误报告自动分类算法改进

Improved automatic classification algorithm of software bug report in cloud environment
下载PDF
导出
摘要 用户提交的软件错误报告随意性大、主观性强且内容少导致自动分类正确率不高,需要花费大量人工干预时间。随着互联网的快速发展用户提交的错误报告数量也不断增加,如何在海量数据下提高其自动分类的精确度越来越受到关注。通过改进词频-逆文档频率(TF-IDF),考虑到词条在类间和类内出现情况对文本分类的影响,提出一种基于软件错误报告数据集的改进多项式朴素贝叶斯算法,同时在Hadoop平台下使用MapReduce计算模型实现该算法的分布式版本。实验结果表明,改进的多项式朴素贝叶斯算法将F1值提高到71%,比原算法提高了27个百分点,同时在海量数据下可以通过拓展节点的方式缩短运行时间,有较好的执行效率。 User-submitted bug reports are arbitrary and subjective. The accuracy of automatic classification of bug reports is not ideal. Hence it requires many human labors to intervention. With the bug reports database growing bigger and bigger,the problem of improving the accuracy of automatic classification of these reports is becoming urgent. A TF-IDF( Term Frequency-Inverse Document Freqency) based Naive Bayes( NB) algorithm was proposed. It not only considered the relationship of a term in different classes but also the relationship of a term inside a class. It was also implemented in distributed parallel environment of MapReduce model in Hadoop platform. The experimental results show that the proposed Naive Bayes algorithm improves the performance of F1 measument to 71%,which is 27 percentage points higher than the stateof-the-art method. And it is able to deal with massive amounts of data in distributed way by addding computational node to offer shorter running time and has better effective performance.
出处 《计算机应用》 CSCD 北大核心 2016年第5期1212-1215,1221,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61472082) 福建省自然科学基金资助项目(2014J01220)~~
关键词 多项式朴素贝叶斯 错误报告 文本自动分类 词频-逆文档频率 云计算 Naive Bayes of polynomials bug report text automatic classification Term Frequency-Inverse Document Frequency(TF-IDF) cloud computing
  • 相关文献

参考文献18

  • 1ZHANG Jie,WANG XiaoYin,HAO Dan,XIE Bing,ZHANG Lu,MEI Hong.A survey on bug-report analysis[J].Science China(Information Sciences),2015,58(2):88-111. 被引量:8
  • 2STRATE J D, LAPLANTE P A. A literature review of research in software defect reporting[J]. IEEE Transactions on Reliability, 2013, 62(2):444-454.
  • 3SHOKRIPOUR R, ANVIK J, KASIRUN Z M, et al. A time-based approach to automatic bug report assignment[J]. Journal of Systems & Software, 2015, 102:109-122.
  • 4SHOKRIPOUR R, ANVIK J, KASIRUN Z M, et al. Improving automatic bug assignment using time-metadata in term-weighting[J]. IET Software, 2014, 8(6):269-278.
  • 5ALENEZI M, MAGEL K, BANITAAN S. Efficient bug triaging using text mining[J]. Journal of Software, 2013, 8(9):2185-2190.
  • 6SHOKRIPOUR R, ANVIK J, KASIRUN Z M, et al. Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation[C]//Proceedings of the 10th International Workshop on Mining Software Repositories. Piscataway, NJ:IEEE, 2013:2-11.
  • 7黄小亮,郁抒思,关佶红.基于LDA主题模型的软件缺陷分派方法[J].计算机工程,2011,37(21):46-48. 被引量:11
  • 8JEONG G, KIM S, ZIMMERMANN T. Improving bug triage with bug tossing graphs[C]//Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. New York:ACM, 2009:111-120.
  • 9MATTER D, KUHN A, NIERSTRASZ O. Assigning bug reports using a vocabulary-based expertise model of developers[C]//Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. Piscataway, NJ:IEEE, 2009:131-140.
  • 10SHOKRIPOUR R, KASIRUN Z M, ZAMANI S, et al. Automatic bug assignment using information extraction methods[C]//Proceedings of the 2012 International Conference on Computer Science Applications and Technologies. Piscataway, NJ:IEEE, 2012:144-149.

二级参考文献79

  • 1黄建明.贝叶斯网络在学生成绩预测中的应用[J].计算机科学,2012,39(S3):280-282. 被引量:30
  • 2左晓娜,刘冀伟,王志良.基于TAN贝叶斯网络分类器的测井岩性预测[J].微计算机信息,2006(09S):284-286. 被引量:4
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 4Hadoop WT. The definitive guide.O'Reilly Media,Inc, 2009.
  • 5Taiwan Hadoop Forum.http://forum.hadoop.tw/2009.
  • 6Apache Hadoop.(2009-09-12).http://hadoop.apache.org/.
  • 7McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification. AAAF ICML-98 Workshop on Learning for Text Categorization 1998:41-48.
  • 8Dean J, Ghemawat S. MapReduce: Simplifed Data Processing on Large Clusters. Proc.of the 6th Symposium on Operating System Design and Implementation, San Francisco, 2004.
  • 9Cutting D. Scalable Computing with MapReduce. Proc.of O'Reilly Open Source Convention, Poland. 2005.
  • 10Salton G, Clement TY. On the construction of effective vocabularies for information retrieval. Proc. of the 1973 Meeting on Programming Languages and Information Retrieval, New York ACM, 1973:11.

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部