期刊文献+

一种面向程序理解的程序语义聚类技术

Semantics-driven Program Clustering for Program Comprehension
下载PDF
导出
摘要 针对源代码中一些非结构化的自然语言描述信息进行语义聚类,辅助开发人员开展程序理解。主要利用自然语言处理技术对程序中的标识符和注释进行预处理,将程序转换成词频矩阵;然后利用潜在语义索引技术对该词频矩阵进行层次聚类,并对每个聚类的标记进行推荐,辅助开发人员理解程序。在开源项目JEdit上进行验证,结果显示对该5万行规模的项目代码进行聚类时耗不足1分钟。因此,该技术能够快速对程序进行语义聚类,辅助开发人员快速理解程序。 This paper focuses on semantic clustering for program comprehension on the unstructured textual information.First,we employ the natural language processing technique to pre-process the natural language text in the program,and gets an intermediate representation,i.e.,term-document matrix.Then,we use the LSI(Latent Semantic Indexing)technique to analyze the matrix,and get a set of hierarchical clusters.In order to facilitate comprehension of each cluster,we also generate the recommendations of words to label each cluster.We evaluated our approach on the open source project,JEdit,and the results showed that the time required to cluster such scale of 50,000-LOC project was less than 1 minute.Hence,the proposed technique can quickly perform the program semantic clustering,supporting developers’quick program understanding.
作者 陈颖 CHEN Ying(School of Information Engineering,Yangzhou University,Yangzhou 225127,China)
出处 《软件导刊》 2019年第10期62-64,共3页 Software Guide
基金 江苏省教育信息化研究基金项目(20180104) 中国民航信息技术科研基地开放基金项目(CAAC-ITRB-201704)
关键词 程序理解 语义聚类 潜在语义索引 语义标注 program comprehension semantics clustering latent semantic indexing semantics labelling
  • 相关文献

参考文献4

二级参考文献19

  • 1罗景,赵伟,秦涛,姜人宽,张路,孙家骕.基于有向带权图迭代的面向对象系统分解方法[J].软件学报,2004,15(9):1292-1300. 被引量:13
  • 2赵伟,张路,梅宏,孙家骕.一种基于功能需求层次凝聚的程序聚类方法[J].软件学报,2006,17(8):1661-1668. 被引量:12
  • 3Zhao Wei, Zhang Lu, Mei Hong, et al. Requirements guided dynamic software clustering [C] //Proc of the 21st IEEE Int Conf on Software Maintenance. Piscataway, NJ: IEEE, 2005 : 605-608.
  • 4Mitchell B S, Mancoridis S. Comparing the decompositions produced by software clustering algorithms using similarity measurements[C] //Proc of the 17th IEEE Int Conf on Software Maintenance. Piscataway, NJ: IEEE, 2001: 744- 753.
  • 5Chiricota Y, Jourdan F, Melancon G. Software components capture using graph clustering [C] //Proc of the 11th IEEE Int Workshop on Program Comprehension. Piscataway, NJ : IEEE, 2003:217-226.
  • 6Deursen A van, Kuipers T. Identifying objects using cluster and concept analysis [C] //Proc of the 21st IEEE Int Conf on Software Engineering. Piscataway, NJ: IEEE, 1999: 246- 255.
  • 7Lindig C, Snelting G. Assessing modular structure of legacy code based on mathematical concept analysis [C] //Proc of the 19th IEEE Int Conf on Software Engineering. Piseataway, NJ: IEEE, 1997:349-369.
  • 8Maneoridis S, Mitchell automatic clustering to B S, Rorres C, et al. Using produce high level system organizations of source code [C] //Proc of the 6th IEEE Int Workshop on Program Comprehension. Piscataway, NJ: IEEE, 1998:45-52.
  • 9Andritsos P, Tzerpos V. Information-theoretic software clusteing [J]. IEEE Trans on Software Engineering, 2005, 31(2).150-165.
  • 10Anquetil N, Lethbridge T. Extracting concepts from file names: A new file clustering criterion [C] //Proc of the 20th IEEE Int Conf on Software Engineering. Piseataway, NJ: IEEE, 1998:84-93.

共引文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部