
基于AST的代码抄袭检测方法研究 被引量:7

AST-based plagiarism detection method
摘要 为了检测程序设计类课程中出现的作业抄袭行为,提出了基于抽象语法树的抄袭检测方法。运用语法分析工具对代码进行语法分析生成抽象语法树(AST),通过计算生物学中序列匹配的算法进行程序相似度的计算。提取程序相似部分的AST特征,生成空间向量,聚类分析找出"抄袭团伙"。实验结果表明,该方法对抄袭行为具有较好的检测效果,并能比较准确地找到"抄袭团伙"。 To detect plagiarism on programming course, a AST-based plagiarism detection method is proposed. Firstly, the code is parsed by syntax analysis tool to generate the corresponding abstract syntax tree (AST). Biology sequence matching algo rithms are used to calculate the similarity of the program. The similar part of the code is found, and then the AST feature is ex tracted in this part. A vector space model is generated, and then "copy cluster" is found by clustered the feature. Experiments show that this method has a good effect on the detection of plagiarism and can find the "copy cluster" accurate.
出处 《计算机工程与设计》 CSCD 北大核心 2012年第4期1660-1664,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(60940027) 内蒙古自然科学基金项目(2010MS0906)
关键词 抄袭检测 抽象语法树 序列匹配 空间向量模型 聚类 plagiarism detection AST sequence alignment VSM cluster
  • 相关文献



  • 1吕宝忠 钟扬 高莉萍.分子进化与系统发育[M].北京:高等教育出版社,2002.202-206.
  • 2Smith T F,Wateman M S,Fitch W M.Comparative biosequence metrics[J].J Mol Evol, 1981,18 : 38-46.
  • 3Davie B,Charny A,Bennet J,et al.An expedited forwarding PHB[C/OL].IETF Internet Draft.[2000-03].http://www.ietforg/internet-drafts/draft-ietf-diff-serv-rfc2598his-02.txt.
  • 4Mount D W.Bioinformatics : Sequence and genome analysis[M].USA: Cold Spring Harbor Laboratory Press,2002:53-54.
  • 5Nwwdleman S B ,Wunsch C D.A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. J Mol Biol, 1970,48:443-453.
  • 6Smith T,Wateman M.Identification of common molecular sequence[J]. Journal of Molecular Biology, 1981,147:195-197.
  • 7Ahschul S F,Madden T L,Schaffer A A,et al.Capped BLAST and PSI-BLAST:A new generation of protein database search programs[J]. Nucleic Acids Res,1997,25:3389-3402.
  • 8Matthias R. Effective clone detection without language barriers [ D]. Switzerland: Institut fur lnformatik und angewandte Mathematik ,Bern University ,2005.
  • 9Vereo K L,Wise M J. Software for detecting suspected plagiarism: comparing structure and attribute counting systems [ C ]// John R. Proceedings of 1st Australian Conference on Computer Science Education. New York :ACM ,1996:81 - 88.
  • 10Baker B S, Manber U. Deducing similarities in Java sources from byte codes[ C ]//Douglis F. Proceedings of Usenix Annual Technical Conference. Louisiana: USEN1X,1998.



  • 1肖自乾,王弗雄,陈经优.基本路径测试方法之圈复杂度计算[J].软件导刊,2010,9(1):10-12. 被引量:8
  • 2Aiken A. Moss;a system for detecting software plagiarism[OL]. (2009-12-21). http;//theory, stanford, edu/aiken/ moss/2009-12-21.
  • 3Emeric K, Moritz K. JPlag.. a system that finds similarities among multiple sets of source code files [EB/OL]. (2009-02-01). 2005. http://www, ipd. Uni-karlsruhe. de/jplag/ [-2009-02,011.
  • 4Aiken A. Moss:a system for detecting software plagiarism [EB/OL]. (2009-02-01). 2006. http://theory, stanford, edu/ aiken/moss/[-2009-02-01].
  • 5Zhang Liping, Liu Dongsheng, Zhong Mei,et al. Research on Copy Detecting Strategy and Evaluation Mechanism for Program Based on Syntax Tree [C]// The 2011 3rd International Conference on Computer Design and Applications (ICCDA 2011). Xi'an,2011:141-144.
  • 6Zhang Liping, Liu Dongsheng,Li Yanchen,et al. AST-based Plagiarism Detection Method [C]//The 2011 International Workshop on Internet of Things Technology and Innovative Application Design (IOT Workshop 2011). Beijing,2011.. 575-578.
  • 7熊赞.生物序列模式挖掘与聚类研究[D].上海:复旦大学计算机科学与技术学院,2007.
  • 8GEORGINA C, MIKE J. Source-code plagiarism: a UK academic perspective, RR-422 [R]. Coventry, England: Department of ComputerScience, University of Warwick, 2006.
  • 9SHEARD J, DICK M, MARKHAM S, et al. Cheating and plagia- rism:perceptions and practices of first year IT students[C]. Proc of the 7th annual SIGCSE conference on innovation and technology in computer science education. New York: Association for Computing Machinery, 2002: 183-187.
  • 10K J OTTENSTEIN. An algorithmic approach to the detection and prevention of plagiarism [J]. ACM SIGCSE Bulletin, 1976, 8(4) : 30-41.










使用帮助 返回顶部