摘要
在软件系统中,相同或相似的代码片段称为代码克隆。目前研究人员已经提出了一些克隆检测方法。这些方法通常仅针对软件系统单个版本进行克隆检测,而在部分场景下,如构建克隆演化谱系时,需要对系统每个版本进行克隆检测,则尤为耗时。为此,提出一种针对多版本软件系统的克隆检测加速技术,可以快速得到每个版本的克隆情况。该技术通过版本间方法映射技术为不同版本代码内容高度相似的同一方法构建方法版本组,选取每个方法版本组中最早的版本作为样本方法,样本方法的集合构成历史映像,对历史映像进行克隆检测,同时建立样本方法和方法版本组间的方法索引。根据历史映像克隆检测结果及方法索引恢复原始的全量克隆关系。采用该克隆检测加速技术在251个开源项目的3 234个版本共计3亿行代码上进行克隆检测实验,与未加速相比,效率提升了近4倍。
In software systems, the same or similar code fragments are called code clones. Researchers have come up with some clone detection methods. These methods usually perform cloning detection on a single version of the software system. In some scenarios, such as constructing a clonal evolution pedigree, it is particularly time-consuming to perform cloning detection on each version of the system. Therefore, we propose a clone detection acceleration technology for multi-version software systems, which can quickly obtain the clone situation of each version. The technology constructed method version groups for the same method with highly similar code contents in different versions through inter version method mapping technology. It selected the earliest version in each method version group as a sample method. The collection of sample methods constituted the historical image, and the historical image was clone detected. The method index was established between sample method and method version group. The original full clone relationship was restored according to the historical image clone detection results and method index. Using the clone detection acceleration technology in this paper, clone detection experiments were carried out on 3 234 versions of 251 open source projects, with a total of 300 million lines of code. Compared with non acceleration, the efficiency is improved by nearly four times.
作者
方维康
吴毅坚
赵文耘
Fang Weikang;Wu Yijian;Zhao Wenyun(School of Software,Fudan University,Shanghai 200438,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 200438,China)
出处
《计算机应用与软件》
北大核心
2022年第4期14-20,共7页
Computer Applications and Software
基金
上海市科技发展基金项目(18DZ1112100,18DZ1112102)。
关键词
代码克隆
克隆检测
历史映像
方法版本组
Code clone
Clone detection
Historical image
Method version group