摘要
针对当前克隆跟踪大多基于软件的发布版本,丢失了软件开发过程中克隆代码较多的变化信息,并且克隆演化模式定义不明确、不区分视角。提出一种基于修改日志克隆代码跟踪方法,并分三种视角(克隆群、克隆片段、克隆代码内容)识别演化模式。首先,将每次提交作为一个小版本,使用Ni Cad进行克隆检测;其次,基于Token编辑距离相似度克隆群初步映射;再次,基于修改日志克隆片段精准映射;然后,基于克隆片段映射结果修正克隆群映射;最后,分视角识别克隆演化模式。对6款开源软件总共近8 000个版本进行实验,结果表明超过97%的克隆稳定演化,而分离演化模式、合并演化模式、复杂演化模式均不超过0.01%,一致变化演化模式、不一致变化演化模式均不超过2%。在多款软件上与领域内较优秀的同类工具g Cad进行对比实验,结果查全率(提高了2%)、查准率(提高了2%)明显高于g Cad,而且同环境下速度比g Cad快。
Recently, most clone tracking is based on release version of software, which loses much change information of clone code during the process of software development. Definition of evolution pattern is unclear, and its perspective is not distinguished. This paper proposes a method of tracing clone code based on modify log, and identifies clone evolution pattern by different perspectives(clone class, clone fragment, clone code content). Firstly, regarding each submission as one small version, and detecting clone of each version by Ni Cad; Secondly, mapping clone class initially based on levenshtein distance of token; Thirdly, mapping clone fragment based on modify log; Fourthly, revising clone class mapping based on the result of clone fragment mapping; Finally, identifying clone evolution pattern in different perspectives. The experiment is conducted on nearly 8 thousand versions of 6 open-source software. The results show that more than 97%clone code is in"stable"evolution pattern,"separate","merge","complex"is no more than 0.01%, and"consistent change","inconsistent change"is less than 2%. Contrast to the similar tool named g Cad, recall(increased by 2%)and precision(increased by 2%)of this method are significantly higher, and running speed is also faster in the same environment.
作者
葛广帅
刘东升
张丽萍
侯敏
包萨仁娜
GE Guangshuai;LIU Dongsheng;ZHANG Liping;HOU Min;BAO Sarenna(College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China)
出处
《计算机工程与应用》
CSCD
北大核心
2018年第11期53-61,共9页
Computer Engineering and Applications
基金
国家自然科学基金(No.61462071
No.61363017)
内蒙古自然科学基金(No.2016MS0612
No.2015MS0606)