摘要
代码克隆检测问题是软件工程领域一个基础的研究课题,在代码片段推荐、软件项目维护等应用领域发挥着重要的作用。随着在线代码库中代码规模的快速增长,以及信息检索、机器学习领域的快速发展,代码克隆检测的研究也取得新的进展。介绍代码克隆检测的基本概念与主流方法,重点介绍近几年基于信息检索、机器学习的代码克隆检测的主要方法,对基于token的融合信息检索与深度学习的方法进行实验。
In software engineering, code clone detection has been a basic research topic, which can be applied to several applications, e.g. code snippets recommending and software project maintenance. In recent years, with the blossom of online code repositories, and the rapid development of information retrieval and machine learning, the research of code clone detection has also been benefited from that. Introduces the basic definitions and main approaches, and focuses on the approaches published in recent years. Conducts experiments on token-based method.
作者
王婷
牟永敏
张志华
WANG Ting;MU Yong-min;ZHANG Zhi-hua(School of Computer, Beijing Information Science and Fechnology University, Beijing 100101)
出处
《现代计算机》
2019年第13期32-38,共7页
Modern Computer
基金
北京市自然科学基金(重点研究专题项目)(No.Z160002)
网络文化与数字传播北京市重点实验室开放课题(No.ICDD2017XX)
关键词
代码克隆检测
软件工程
机器学习
信息检索
Code Clone Detection
Software Engineering
Information Retrieval
Machine Learning