摘要
在潜在语义模型的基础上融入了软件文档和程序代码的特点,提出了基于类继承关系的代码聚类、代码特征项分类加权、引入相似度词典以及基于文档类型的分类搜索这四种改进策略.实验结果表明,四种策略可以在保持查全率不变的情况下提高查准率15%左右.表明在提取代码与文档间可跟踪性链时,考虑它们的固有特点,将有助于提高检索系统的查全率和查准率.
Software documentation is usually expressed in natural languages and free text, in which it captures large useful information. Establishing traceability links between documentation and source code can be helpful in Software Engineering Management. Currently, the recovery of traceability links is mostly based on information retrieval techniques, e. g., probabilisfic model, vector space model and Latent Semantic Indexing(LSI). But previous work only treats documentation and source code as plain text files without considering the features with respect to Software Engineering. Four enhancing strategies are proposed to improve the traditional LSI method based on the features of software documentation and source code,namely,source code clustering,identifiers classifying, similarity thesaurus and hierarchical structure enhancement. Experimental results show that the four enhancement strategies can increase the precision by about 15%. So, the special characteristics of documentation and source code should be considered carefully during the recovering traceability links between them.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2009年第B04期22-30,共9页
Acta Electronica Sinica
基金
国家863项目(No.2006AA01Z176)
国家自然科学基金(No.90718018)
关键词
信息检索
可跟踪性链
程序理解
逆向工程
information retrieval(IR)
traceability recovery
program comprehension
reverse engineering