期刊文献+

基于主题的文档与代码间关联关系的提取研究 被引量:3

Research on retrieval methods for traceability between Chinese documentation and source code based on LDA
下载PDF
导出
摘要 软件文档及其程序代码之间的关联性或可追踪性分析是软件分析、理解的重要基础。探讨了软件的中文文档和程序代码中蕴含的主题及其相关性。针对软件文档的章节结构和词汇空间,以及程序代码结构、标识符命名空间、注释风格等方面的特点,在LDA模型的基础上提出了一种基于主题词的软件中文文档与代码间关联关系的分析方法。该方法依据词汇的香农信息提取文本段的主题词。实验结果表明,主题词分析方法与LSI模型分析方法相比在查全率和查准率上均有2%到5%的提高。 In order to establish traceability between Chinese documentations and source codes more effectively, this paper pro- poses one method based on LDA model. It names the topic word-based Traceability Retrieval Method(TRM) in which the topic words are extracted according to Shannon information. Experimental result shows that, compared with the LSI method, the topic word method can increase the recall and precision from 2% to 5%.
作者 许冶冰 刘超
出处 《计算机工程与应用》 CSCD 2013年第5期70-76,共7页 Computer Engineering and Applications
关键词 可追踪链 主题模型 隐含狄利克雷分配(LDA) 逆向工程 traceability recovery topic model Latent Dirichlet Allocation(LDA) reverse engineering
  • 相关文献

参考文献13

  • 1Antoniol G, Canfora G, Casazza G, et al.Recovering traceability links between code and documentation[J].IEEE Transactions on Software Engineering, 2002,28 (10) : 970-983.
  • 2Spanoudakis G, Zisman A.Software traceability: a roadmap. handbook of software engineering and knowledge engineer- ing[M].Singapore:World Scientific Publishing,2005:395-428.
  • 3Marcus A, Maletic J I.Recovering documentation-to-source- code traceability links using latent semantic indexing[C]//Pro- ceedings 25th International Conference on Software Engi- neering (ICSE' 03 ) .USA: [s.n.], 2003 : 125-135.
  • 4赖冠辉.代码与文档间关联关系的提取方法研究和改进[D].北京:北京航空航天大学计算机学院,2009.
  • 5韩晓东,王晓博,刘超.中文文档与源代码间关联关系提取方法的研究[J].合肥工业大学学报(自然科学版),2010,33(2):188-192. 被引量:5
  • 6徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436. 被引量:236
  • 7石晶,胡明,石鑫,戴国忠.基于LDA模型的文本分割[J].计算机学报,2008,31(10):1865-1873. 被引量:54
  • 8Asuncion H, Asuncion A, Taylor R.Software traceability with topic modeling[C]//Proceedings of the 32nd ACM/IEEE In- ternational Conference on Software Engineering(ICSE' 10). Cape Town,South Africa:[s.n.],2010:95-104.
  • 9Deerwester S C,Dumais S T,Landauer T K,et al.Indexing by latent semantic analysis[J].Joumal of the American Soci- ety for Information Science, 1990,41(6) :391-407.
  • 10Blei D, Ng A, Jordan M.Latent dirichlet allocation[J].Jour- nal of Machine Learning Research,2003,3 : 993-1022.

二级参考文献109

  • 1钱剑飞,陈华,陈奇,俞瑞钊.一种代码与中文文档关联信息的自动提取方法[J].浙江大学学报(工学版),2004,38(11):1417-1421. 被引量:2
  • 2寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 3朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 4石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 5Antoniol G,Canfora G, Casazza G, et al. Recovering traceability links between code and documentation [ J ]. IEEE Transactions on Software Engineering, 2002, 28 ( 10 ) : 970-983.
  • 6Marcus A, Maletie J I. Recovering documentation to source code traceability links using latent semantic indexing[C]// Proceedings of 25th International Conference on Software Engineering (ICSE' 03 ). Portland, OR, USA, 2003: 125-135.
  • 7赖冠辉.代码与文档间关联关系的提取方法研究和改进[D].北京:北京航空航天大学计算机学院,2009.
  • 8Papadimitriou C H, Raghavan P, Tamaki H, et al. Latent se mantic indexing: a probabilistic analysis [J]. Journal of Computer and System Sciences,2000,61(2) : 217-235.
  • 9赵丹群.现代信息检索[M].北京:北京大学出版社,2008:131-135.
  • 10Cleland-huang J, Settimi R,Chuan D, et al. Utilizing supporting evidence to improve dynamic requirements traceability[C]//Proceedings of 13th IEEE International Conference on Requirements Engineering(RE'05). Paris, France,2005 :135-144.

共引文献356

同被引文献22

  • 1徐宝文,聂长海,史亮,陈火旺.一种基于组合测试的软件故障调试方法[J].计算机学报,2006,29(1):132-138. 被引量:38
  • 2Spanoudakis G, Zisman A. Software traceability: a roadmap[M]// Handbook of Software Engineering and Knowledge Engineering. [S.l.]: World Scientific Publishing, 2004: 395-428.
  • 3Delater A, Paech B, Narayan N. Tracing requirements and source code during software development[C]//Proceedings of the 7th International Conference on Software Engineering Advances (ICSEA '12), Lisbon, Portugal, Nov 18-23, 2012. IS.1.]: IARIA, 2012: 274-282.
  • 4Antoniol G, Canfora G, Casazza G, et.al. Recovering trace- ability links between code and documentation[J]. IEEE Trans- actions on Software Engineering, 2002, 28(10): 970-983.
  • 5Marcus A, Maletic J I. Recovering documentation- to-source- code traceability links using latent semantic indexing[C]// Proceedings of the 25th Imemational Conference on Soft- ware Engineering (ICSE '03), Portland, USA, May 3-10, 2003. Piscataway, NJ, USA: IEEE, 2003: 125-135.
  • 6Lai Guanhui. Analysis and improvement on retrieval meth- ods for traceability links between source code and docu- mentation[D]. Beijing: Beihang University, 2009.
  • 7Panichella A, McMillan C, Moritz E, et al. When and how using structural information to improve IR-based traceability recovery[C]//Proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR '13), Genova, Italy, Mar 5-8, 2013. Piscataway, NJ, USA: IEEE, 2013: 199-208.
  • 8McMillan C, Poshyvanyk D, Revelle M. Combining textual and structural analysis of software artifacts for traceability link recovery[C]//Proceedings of the 5th International Work- shop on Traceability in Emerging Forms of Software Engi- neering, Vancouver, Canada, May 18, 2009. Piscataway, NJ, USA: IEEE, 2009: 41-48.
  • 9Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J], Journal of the American Society for Information Science, 1990, 41 (6): 391-407.
  • 10Dumais S T. Improving the retrieval of information from external sources[J]. Behavior Research Methods, Instru- ments, and Computers, 1991, 23(2): 229-236.

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部