期刊文献+

基于频繁闭合序列模式挖掘的学生程序雷同检测 被引量:1

Plagiarism detection in student programs based on frequent closed sequence mining
下载PDF
导出
摘要 针对学生程序抄袭导致考核可信度降低而人工检测抄袭工作量巨大的问题,提出了程序雷同检测模型,首先通过词法分析将程序转换成token序列,并将其散列映射为数字序列;然后采用BIDE挖掘算法挖掘频繁闭合序列;在此基础上,识别相似代码片段,并计算程序之间的相似度,进而判定程序是否雷同。实验结果表明,与目前应用广泛的雷同程序检测工具MOSS相比,本文方法提高了雷同检测的准确性,不但可以准确地给出雷同统计信息,还能够较为直观地显示雷同代码片段。 Plagiarism in student programs is a common phenomenon, which decreases the credibility of assessment. However, manual detection loads a heavy burden on the teachers. To solve this problem, a plagiarism detection model is proposed. First, student programs are converted into token sequences through lexical analysis. Then, the token sequences are hashed to digital sequences. Then, the frequent closed sequences are mined by the BIDE algorithm. On this basis, the similar code fragments are detected and the plagiarism programs are identified by the calculated similarity. Experimental results show that, compared with the commonly used toll MOSS, the proposed method is more precise. It can not only give accurate statistical information of similar programs, but also explicitly display the plagiarized code fragments.
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2015年第4期1260-1265,共6页 Journal of Jilin University:Engineering and Technology Edition
基金 国家自然科学基金项目(61202092 61173021) 高等学校博士学科点专项科研基金项目(20112302120052) 哈尔滨科技创新人才专项项目(RC2013QN010001) 黑龙江省高教学会'十二五'重点规划课题项目(HGJXHB1110957) 黑龙江省普通高校青年学术骨干项目(1254G037)
关键词 计算机软件 抄袭检测 频繁闭合序列模式 相似度 雷同代码 computer software plagiarism detection frequent closed sequence mining similarity similar code
  • 相关文献

参考文献10

  • 1Shawky D M, Ali A F. An approach for assessing similarity metrics used in metric-based clone detec- tion techniques[C]///The 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), Chengdu,2010: 580-584.
  • 2Brixtel R, Fontaine M, Lesner B, et al. Language- independent clone detection applied to plagiarism de- tection[C]//The 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM), Timisoara, 2010 : 77-86.
  • 3Dang Y, Ge S, Huang R, et al. Code clone detec- tion experience at Microsoft[C]//Proceedings of the 5th International Workshop on Software Clones, ACM, 2011: 63-64.
  • 4Zibran M F, Roy C K. IDE-based real-time focused search for near-miss clones[C]///Proceedings of the 27th Annual ACM Symposium on Applied Compu- ting, ACM, 2012: 1235-1242.
  • 5Higo Y, Kamiya T, Kusumoto S, et al. Method and implementation for investigating code clones in a software system [J]. Information and Software Technology, 2007, 49(9): 985-998.
  • 6邓爱萍.程序代码相似度度量算法研究[J].计算机工程与设计,2008,29(17):4636-4638. 被引量:24
  • 7古平,张锋,周海涛.一种程序源代码相似度度量方法[J].计算机工程,2012,38(6):37-39. 被引量:7
  • 8张丽萍,刘东升,李彦臣,钟美.一种基于AST的代码抄袭检测方法[J].计算机应用研究,2011,28(12):4616-4620. 被引量:8
  • 9Schleimer S, Wilkerson D S, Aiken A. Winnowing: local algorithms for document fingerprinting[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, 2003: 76-85.
  • 10Wang J, Han J. BIDE: efficient mining of frequent closed sequences[C]//IEEE 20th International Con- ference on Data Engineering, 2004: 79-90.

二级参考文献36

  • 1程金宏,刘东升.程序代码相似度自动度量技术研究综述[J].内蒙古师范大学学报(自然科学汉文版),2006,35(4):457-461. 被引量:13
  • 2GEORGINA C, MIKE J. Source-code plagiarism:a UK academic per- spective, RR- 422 [ R ]. Coventry, England: Department of Computer Science, University of Warwick,2006.
  • 3SHEARD J, DICK M, MARKHAM S, et al. Cheating and plagiarism : perceptions and practices of first year IT students [ C ]//Prec of the 7th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education. New York : ACM Press, 2002 : 183-187.
  • 4JONES E L. Metrics based plagiarism monitoring [ C ]/:/ Proc of the 6th Annual CCSC Northeastern C6nference on the Journal of Compu- ting in Small Colleges. [ S. l. ] : Consortium for Computing Sciences in Colleges,2001:253-261.
  • 5HALSTEAD M H. Elements of software science [ M ]. New York: Elsevier Science Inc, 19777.
  • 6GRANVILLE A. Detecting plagiarism in Java code [ D]. Sheffield:U- niversity of Sheffield, 2002.
  • 7CLOUGH P. Plagiarism in natural and programming languages : an o- verview of current tools and technologies, research memoranda CS-00- 05 [ R]. Sheffield : University of Sheffield, 2000.
  • 8WISE M J. YAP5 :improvement detection of similarities in computer program and other texts [ C ]//Proc of the 27th SIGCSE Technical Symposium on Computer Science Education. New York : ACM Press, 1996 : 130-134.
  • 9PRECHELT L, MALPOHL G, PHILIPPSEN M. Finding plagiarisms among a set of programs with JPlag[ J]. Journal of Universal Com- puter Science, 2002,8( 11 ) :1016-1038.
  • 10AIKEN A. Moss: a system for detecting software plagiarism [ EB/ OL]. (2011-04-29). http://theory, stanford, edu/-aiken/moss/.

共引文献33

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部