摘要
针对程序代码相似性检测度量忽略程序语义、出现无效度量的问题,提出一种基于抽象语法树(AST:Abstract Syntax Tree)的程序代码相似性度量方法。通过预处理去除生成AST时的冗余信息,再进行词法语法分析,得到相应的AST;然后通过自适应阈值的选取方式,利用AST遍历得到的程序属性、方法序列,对AST进行相似度计算,最终判定是否抄袭,生成相似度检测报告。实验结果表明,该方法能有效检测Java程序代码的多种抄袭行为。
In order to solve the program code similarity detection measurement which ignores the program semantics and the invalid measurement, we present an AST( Abstract Syntax Tree) based on the program code similarity measure method. Through the pretreatment redundancy removal in AST generation and the lexieal grammar analysis, get the corresponding AST; and then according to the adaptive threshold method, using the AST traversal which include the sequence and process attributes to take the similarity calculation; finally, determine whether plagiarism and generate the test report. The experimental results show that this method can effectively detect a variety of plagiarism java code.
出处
《吉林大学学报(信息科学版)》
CAS
2015年第1期99-104,共6页
Journal of Jilin University(Information Science Edition)
基金
吉林省科技厅自然科学基金资助项目(20130101060JC)
吉林省教育厅"十二五"科学技术研究基金资助项目(2014132
2014125)
关键词
相似性度量
抽象语法树
相似度
自适应阈值
similarity measurement
abstract syntax tree (AST)
similarity
adaptive threshold