摘要
软件系统中两个或两个以上的相似代码片段被称为代码克隆(code clone)。有研究表明,代码克隆在软件系统中大量存在,并且随着时间推移不断增长。随着代码开源成为潮流,代码克隆占比越来越高。已有研究工作发现软件系统中的代码克隆是有害的,会导致系统稳定性降低,造成代码库冗余和软件缺陷传播等问题。为了提高代码质量,目前学术界和工业界已经提出了多种代码克隆检测方法,按照获取代码的信息程度不同分为基于文本、词法、语法、语义、和度量值5种方法,不同的方法具有不同的性能和应用场景。文中分析了软件克隆出现的原因及优缺点,对软件系统中的代码克隆问题进行了分类,评价了5种不同类型检测方法各自的优势,详细介绍了部分方法的核心思想、检测语言、验证所用数据集及检测效果等技术特征。文章最后总结了克隆检测技术所适用的不同应用场景,对代码克隆检测方法和应用的发展方向做出了展望。
In a software system,if there are two or more similar pieces of code,it is called code clone.Studies have shown that code cloning exists in a large number of software systems and is increasing with the passage of time.In the era of big data,open source code has been a trend,so the proportion of code cloning will be higher and higher.The existing related work thinks that the code cloning in software system is harmful,which will lead to the decrease of system stability,the redundancy of code base and the spread of software defects.This paper analyzes the reasons,advantages and disadvantages of software cloning,and classifies the code cloning in software system.In order to improve the code quality,many code cloning detection methods have been proposed in the academic and industrial circles.According to the degree of information obtained from the code,these methods can be divided into five types based on text,lexis,syntax,semantics,and metric.This paper analyzes and evaluates these five kinds of detection methods,and finds that all kinds of methods have no absolute advantages,and different methods can be applied to diffe-rent fields according to different requirements.In the end,the paper summarizes the different application scenarios of the cloning detection technology,and forecasts the development direction of the code cloning detection methods and applications.
作者
乐乔艺
刘建勋
孙晓平
张祥平
LE Qiao-yi;LIU Jian-xun;SUN Xiao-ping;ZHANG Xiang-ping(School of Computer Engineering and Science,Hunan University of Science&Technology,Xiangtan,Hunan 411100,China)
出处
《计算机科学》
CSCD
北大核心
2021年第S02期509-522,共14页
Computer Science
基金
国家自然科学基金(61872139)。