期刊文献+

基于LDA主题模型的软件缺陷分派方法 被引量:11

Software Bug Triage Method Based on LDA Topic Model
下载PDF
导出
摘要 传统的基于向量空间模型的软件缺陷分派方法,由于存在特征空间维度高、数据稀疏且包含噪音等问题,分派准确率较低。为此,提出一种基于隐含狄利克雷分配(LDA)主题模型的软件缺陷分派方法,将缺陷报告从原始的高维文本单词空间映射到低维语义主题空间,在新的低维主题空间上进行分派。实验结果表明,在使用SVM和KNN分类器时,该方法的分派准确率较高。 In traditional Vector Space Model(VSM) based software bug triage,the high dimensionality feature space are sparse and noise containing.Inspired by these characteristics,this paper proposes a software bug triage method based on Latent Dirichlet Allocation(LDA) topic model.It maps the bug report to the topic space,and makes triage in the new low dimension topic space.Experimental results show that,the method works well on bug triaging,with SVM and KNN classifiers.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第21期46-48,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60873040)
关键词 软件缺陷分派 隐含狄利克雷分配模型 马尔可夫链蒙特卡洛方法 吉布斯采样 文本分类 向量空间模型 software bug triage Latent Dirichlet Allocation(LDA) model Markov-Chain Monte Carlo(MCMC) method Gibbs sampling text classification Vector Space Model(VSM)
  • 相关文献

参考文献6

  • 1Cubranic D, Murphy G C. Automatic Bug Triage Using Text Categorization[C]//Proc. of the 16th International Conference on Software Engineering and Knowledge Engineering. Edinburgh, UK: [s. n.], 2004.
  • 2Anvik J, Hiew L, Murphy G C. Who Should Fix This Bug?[C]// Proc. of the 28th International Conf. on Software Engineering. Shanghai, China: [s. n.], 2006.
  • 3Ahsan S N, Ferzund J, Wotawa E Automatic Software Bug Triage System(BTS) Based on Latent Semantic Indexing and Support Vector Machine[C]//Proc. of the 4th International Conference on Software Engineering Advances. Porto, Portugal: [s. n.], 2009.
  • 4Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 5石晶,李万龙.基于LDA模型的主题词抽取方法[J].计算机工程,2010,36(19):81-83. 被引量:47
  • 6Giffiths T L, Steyvers M. Finding Scientific Topics[J]. Proc. of National Academy of Science, 2004, 101(SI): 5228-5235.

二级参考文献9

  • 1Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 2Caol J, Li Jintao, Zhang Yongdong, et al. LDA-based Retrieval Framework for Semantic News Video Retrieval[C]//Proc. of Conf. on Semantic Computing. Irvine, California, USA: IEEE Press, 2007.
  • 3Steyvers M, Griffiths T. Probabilistic Topic Models[M]//Landauer T, McNamara D, Dennis S, et al. Latent Semantic Analysis: A Road to Meaning. [S. l.]: MIT Press, 2006.
  • 4Griffiths T, Steyvers M. Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101 (Suppl. 1 ): 5228-5235.
  • 5Nevada L V. Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2008: 569-577.
  • 6Li Hang, Yamanishi K. Topic Analysis Using a Finite Mixture Model[J]. Information Processing & Management, 2003, 39(4): 521-541.
  • 7Liu Ying, Ciliax B J, Borges K, et al. Comparison of Two Schemes for Automatic Keyword Extraction from MEDLINE for Functional Gene Clustering[C]//Proc. of IEEE Computational Systems Bioinformatics Conference. Stanford, Califomia, USA: IEEE Press, 2004: 394-404.
  • 8李文波,孙乐,张大鲲.基于Labeled-LDA模型的文本分类新算法[J].计算机学报,2008,31(4):620-627. 被引量:102
  • 9石晶,胡明,石鑫,戴国忠.基于LDA模型的文本分割[J].计算机学报,2008,31(10):1865-1873. 被引量:54

共引文献46

同被引文献121

  • 1侯汉清 ,章成志 ,郑红 .Web概念挖掘中标引源加权方案初探[J].情报学报,2005,24(1):87-92. 被引量:32
  • 2罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:378
  • 4张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 5搜狗实验室.文本分类语料库[EB/OL].[2008-07-20].http://www.sogou.com/labs/dl/c.html.
  • 6Deerwester S, Dumais S, Furnas G W, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
  • 7Hofmann T. Prnbabilistie Latent Semantic Indexing [C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, California, United States. New York: ACM, 1999: 50-57.
  • 8Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 9Phan X, Nguyen M, Horiguchi S. Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections [C]. In: Proceedings of the 17th Conference on World Wide Web. New York: ACM, 2008: 91-100.
  • 10Dempster A P, Laird N M, Rubin D B. Maximum Likelihood from Incomplete Data via the EM Algorithm[J]. Journal of the Royal Statistical Society, 1977, 39(1): 1-38.

引证文献11

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部