期刊文献+

基于程序向量树的代码克隆检测 被引量:4

Code Clone Detection Based on Program Vector Tree
下载PDF
导出
摘要 代码克隆能够加速软件开发但是也会导致缺陷重复发生和软件质量问题。部分类型的代码克隆在字面上相似度低,导致识别困难。针对这一问题,提出一种基于程序向量树的代码克隆检测方法。首先,基于统计语言模型抽取词法单元的特征表示,分析不同字面单词之间的语义相似性;接着,通过语法分析提取程序的抽象语法树(AST),为叶子节点赋予对应字面单词的特征表示,将抽象语法树转化为程序向量树;最后,提出一种加权编码规则,在考虑区分不同树节点重要程度的基础上,将程序向量树编码为定长向量,而具有相似向量表示的代码片段被判定为代码克隆。实验结果表明,在真实代码克隆的大规模标准数据集BigCloneBench上,针对在字面上相似度较低的Moderately Type-3和Type-4类型克隆进行检测时,该方法均优于当前的主流方法,包括NiCad、Deckard、SourcererCC和Oreo等,证实了该方法的有效性。 Code cloning facilitates software development but also causes recurring bugs and software quality problems.Some types of code clones have very low similarity in literal, leading to difficulty of detection. Aiming at this problem, this paper proposes one method of code clone detection based on the program vector tree. First, the feature representations of lexical units are extracted based on a statistical language model and the semantic similarities between different literal words are analyzed. Second, the abstract syntax tree(AST) of each program is extracted by syntactical analysis, and each AST is transformed into a program vector tree with each leaf node assigned a feature representation of the corresponding literal word. Finally, one weighted encoding mechanism is proposed for encoding each program vector tree into a fixed-sized vector, considering different weight information of nodes in the tree, and code fragments with similar vector representations are reported as code clones. Experimental results on Big CloneBench, an existing large benchmark of real code clones, show that this method outperforms many prominent clone detection methods, including NiCad, Deckard, SourcererCC and Oreo, etc., in detecting Moderately Type-3 or Type-4 clones that have low similarity in literal, which verifies the validity of this method.
作者 曾杰 贲可荣 张献 李晓伟 周全 ZENG Jie;BEN Kerong;ZHANG Xian;LI Xiaowei;ZHOU Quan(College of Electronic Engineering,Navy University of Engineering,Wuhan 430033,China;Jinghang Research Institute of Computing and Communication,Beijing 100074,China;School of Computer Science,Wuhan University,Wuhan 430072,China)
出处 《计算机科学与探索》 CSCD 北大核心 2020年第10期1656-1669,共14页 Journal of Frontiers of Computer Science and Technology
关键词 代码克隆 代码克隆检测 抽象语法树(AST) 程序向量树 code clone code clone detection abstract syntax tree(AST) program vector tree
  • 相关文献

参考文献2

二级参考文献15

  • 1Raghavan K. Automated duplicated-code detection and pro- cedure extraction[D]. Wisconsin: University of Wisconsin- Madison, 2003.
  • 2Google code search[EB/OL]. [2013-08-10]. http://en.wiki- pedia.org/wiki/Google_Code_Search.
  • 3Roy C K, Cordy J R. A survey on software clone detection research, Queen's Technical Report 541 [R]. 2007:115.
  • 4Baker I~ S. On finding duplication and near-duplicate in large software systems[C]//Proceedings of the 2rid Working Conference on Reverse Engineering (WCRE '95). Washington, DC, USA: IEEE Computer Society, 1995: 86-95.
  • 5Mockus A. Large-scale code reuse in open source sottware[C]// Proceedings of the 1st International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS '07),Minneapolis, USA, 2007. Washington, DC, USA: IEEE Com- puter Society, 2007: 1-7.
  • 6Liu Peng. Cloud computing: programmers return to the age of personal hero[J/OL]. Programmers, 2010(7). http://www. programmer, com. cn/365.
  • 7Bellon S, Koschke R, Antoniol G, et al. Comparison and evaluation of clone detection tools[J]. IEEE Transactions on Software Engineering, 2007, 33(9): 577-591.
  • 8Kamiya T, Kusumoto S, Inoue K. CCFinder: a multi-linguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002, 28(7): 654-670.
  • 9Baxter I D, Yahin A, Moura L, et al. Clone detection using abstract syntax trees[C]//Proceedings of the 14th Interna- tional Conference on Software Maintenance (ICSM '98), Bethesda, USA, 1998. Washington, DC, USA: IEEE Com- puter Society, 1998: 368-377.
  • 10Inoue K, Sasaki Y, Xia P, et al. Where does this code come from and where does it go?--Integrated code history tracker for open source systems[C]//Proceedings of the 34th Inter- national Conference on Software Engineering (ICSE '12), Zuric, Switzerland, 2012. Piscataway, NJ, USA: IEEE, 2012: 331-341.

共引文献25

同被引文献23

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部