摘要
针对当前Type-3克隆代码检测工具较少、效率偏低等问题,提出了一种基于Token的能有效检测Type-3克隆代码的检测方法。该方法同时能有效检测Type-1和Type-2克隆代码。首先将源代码Token化得到特定代码粒度的Token串,其次将所有Token串的定长子串进行映射,在对映射信息进行查询的基础上,利用编辑距离算法确定克隆对,然后通过并查集算法快速构建克隆群,最终反馈克隆代码信息。实现了原型工具FClones,利用基于代码突变的框架对工具进行了评价,并与领域内较优秀的两款工具Ni Cad及Sim Cad进行了对比。实验结果表明,FClones在检测三类克隆代码时查全率均不低于95%,查准率均不低于98%,能更好地检测Type-3克隆代码。
Aiming at the problems of less clone code detection tools and low efficiency for the current Type-3, an effective clone code detection method for Type-3 based on the levenshtein distance of token was proposed. Type-1, Type-2 and Type-3clone codes could be detected by the proposed method in an efficient way. Firstly, the source codes of a subject system were tokenized into some token sequences with specified code size. Secondly, each definite-sized substring of the token sequences was mapped with corresponding index. Thirdly, the clone pairs were built by the levenshtein distance algorithm and the clone groups were built by the disjoint-set algorithm on the basis of the mapping information query. Finally, the feedback information of clone codes were given. A prototype tool named FClones was implemented. It was evaluated by the code mutation-based framework and compared with two state-of-the-art tools Sim Cad and NiCad. The experimental results show that the recall of FCloens is equal to or greater than 95% and its precision is not lower than 98% in detecting all of these three types of clone codes. FClones can do better in detecting Type-3 clones than others.
出处
《计算机应用》
CSCD
北大核心
2015年第12期3536-3543,共8页
journal of Computer Applications
基金
国家自然科学基金资助项目(61363017
61462071)
内蒙古自然科学基金资助项目(2014MS0613)
内蒙古自治区硕士研究生科研创新基金资助项目(S20141013524)
内蒙古师范大学研究生科研创新基金资助项目(CXJJS14077)