摘要
针对传统相似度算法应用在程序设计课程作业检测中精度较低这一问题,通过研究最长公共子序列等算法,发现其优缺点,在分析的基础上,结合结构度量技术和属性技术两种技术,提出一种性能较好的程序相似度计算方法。方法首先对源程序进行初步处理,将程序中的注释语句和空格删除,再次确定常用元素及常用结构,然后利用Lex统计、抽取程序元素;利用开源代码ucc生成语法树,之后抽取相应的语法结构;最后生成特征向量,并计算代码相似度。实验结果表明该方法比最长公共子序列算法精度提高了10.6%。
To solve the problem of the low precision of testing for similarity of program codes in traditional ways,this thesis proposes an improved technique to make such a test on the combination of technology of attribute counting and that of structure calculation through studying and comparing several different methods of calculating the Longest Common Subsequence.Firstly,source program is processed primarily,annotation statements and spaces are deleted,and common elements and structures get confirmation; next,statistics are made by means of Lex,program elements are extracted,and abstract syntax trees get to be generated using UCC; then,grammar structures are extracted; lastly,eigenvector is produced and the similarity can get calculated.The experimental result shows that the new method is 10.6 percent more precise than those of calculating the Longest Common Subsequence.
出处
《电脑知识与技术(过刊)》
2017年第2X期39-40,共2页
Computer Knowledge and Technology
基金
西安交通工程学院校级教改项目(编号:150006B)
关键词
属性计数法
结构度量技术
相似度度量
attribute counting
structure measurement
similarity measurement