期刊文献+

程序代码相似度检测技术的研究与实现

Research on and Application of Techniques of Test for Similarity of Program Codes
下载PDF
导出
摘要 针对传统相似度算法应用在程序设计课程作业检测中精度较低这一问题,通过研究最长公共子序列等算法,发现其优缺点,在分析的基础上,结合结构度量技术和属性技术两种技术,提出一种性能较好的程序相似度计算方法。方法首先对源程序进行初步处理,将程序中的注释语句和空格删除,再次确定常用元素及常用结构,然后利用Lex统计、抽取程序元素;利用开源代码ucc生成语法树,之后抽取相应的语法结构;最后生成特征向量,并计算代码相似度。实验结果表明该方法比最长公共子序列算法精度提高了10.6%。 To solve the problem of the low precision of testing for similarity of program codes in traditional ways,this thesis proposes an improved technique to make such a test on the combination of technology of attribute counting and that of structure calculation through studying and comparing several different methods of calculating the Longest Common Subsequence.Firstly,source program is processed primarily,annotation statements and spaces are deleted,and common elements and structures get confirmation; next,statistics are made by means of Lex,program elements are extracted,and abstract syntax trees get to be generated using UCC; then,grammar structures are extracted; lastly,eigenvector is produced and the similarity can get calculated.The experimental result shows that the new method is 10.6 percent more precise than those of calculating the Longest Common Subsequence.
作者 卫军超 耿楠
出处 《电脑知识与技术(过刊)》 2017年第2X期39-40,共2页 Computer Knowledge and Technology
基金 西安交通工程学院校级教改项目(编号:150006B)
关键词 属性计数法 结构度量技术 相似度度量 attribute counting structure measurement similarity measurement
  • 相关文献

参考文献2

二级参考文献10

  • 1Baker B S, Giancarlo R. Sparse Dynamic Programming for Lon- gest Common Subsequence from Fragments[J]. Journal of Algorithms, 2002, 42(2): 231-254.
  • 2Kamiya T, Kusumoto S, Inoue K. CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code[J]. IEEE Trans. on Software Engineering, 2002, 28(7): 654-670.
  • 3Schleimer S, Wilkerson D S, Aiken A. Winnowing: Local Algo- rithms for Document Fingerprinting[C]//Proc. of ACM SIGMOD International Conference on Management of Data. San Diego, California, USA: [s. n.], 2003.
  • 4Jones E L. Metrics Based Plagiarism Monitoring[J]. Journal of Computing Sciences in Colleges, 2001, 16(4): 253-261.
  • 5Chilowicz M, Duris E, Roussel G. Syntax Tree Fingerprinting for Source Code Similarity Detection[C]//Proc. of ICPC'09. Vancouver, Canada: [s. n.], 2009.
  • 6Yang Suying, Wang Xin. A Visual Domain Recognition Method Based on Function Mode for SourceCode Plagiarism[C]//Proc. of the 3rd Internatioaai Symposium on Intelligent Information Technology and Security Informatics. Jinggangshan, China: [s. n.], 2010.
  • 7Karp R M, Rabin M O. Efficient Randomized Pattern-matching Algorithms[J]. IBM Journal of Research and Development, 1987, 31 (2): 249-260.
  • 8曹孟春,陈凯明.一种用于反编译代码与源代码的比较算法[J].计算机工程,2009,35(4):38-40. 被引量:3
  • 9于海英.字符串相似度度量中LCS和GST算法比较[J].电子科技,2011,24(3):101-103. 被引量:18
  • 10牛永洁,张成.多种字符串相似度算法的比较研究[J].计算机与数字工程,2012,40(3):14-17. 被引量:37

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部