基于频繁闭合序列模式挖掘的学生程序雷同检测被引量：1

Plagiarism detection in student programs based on frequent closed sequence mining

下载PDF

导出

摘要针对学生程序抄袭导致考核可信度降低而人工检测抄袭工作量巨大的问题,提出了程序雷同检测模型,首先通过词法分析将程序转换成token序列,并将其散列映射为数字序列;然后采用BIDE挖掘算法挖掘频繁闭合序列;在此基础上,识别相似代码片段,并计算程序之间的相似度,进而判定程序是否雷同。实验结果表明,与目前应用广泛的雷同程序检测工具MOSS相比,本文方法提高了雷同检测的准确性,不但可以准确地给出雷同统计信息,还能够较为直观地显示雷同代码片段。 Plagiarism in student programs is a common phenomenon, which decreases the credibility of assessment. However, manual detection loads a heavy burden on the teachers. To solve this problem, a plagiarism detection model is proposed. First, student programs are converted into token sequences through lexical analysis. Then, the token sequences are hashed to digital sequences. Then, the frequent closed sequences are mined by the BIDE algorithm. On this basis, the similar code fragments are detected and the plagiarism programs are identified by the calculated similarity. Experimental results show that, compared with the commonly used toll MOSS, the proposed method is more precise. It can not only give accurate statistical information of similar programs, but also explicitly display the plagiarized code fragments.

作者王克朝王甜甜苏小红马培军

机构地区哈尔滨学院软件学院哈尔滨工业大学计算机学院

出处《吉林大学学报（工学版）》 EI CAS CSCD 北大核心 2015年第4期1260-1265,共6页 Journal of Jilin University:Engineering and Technology Edition

基金国家自然科学基金项目(61202092 61173021) 高等学校博士学科点专项科研基金项目(20112302120052) 哈尔滨科技创新人才专项项目(RC2013QN010001) 黑龙江省高教学会'十二五'重点规划课题项目(HGJXHB1110957) 黑龙江省普通高校青年学术骨干项目(1254G037)

关键词计算机软件抄袭检测频繁闭合序列模式相似度雷同代码 computer software plagiarism detection frequent closed sequence mining similarity similar code

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Shawky D M, Ali A F. An approach for assessing similarity metrics used in metric-based clone detec- tion techniques[C]///The 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), Chengdu,2010: 580-584.
2Brixtel R, Fontaine M, Lesner B, et al. Language- independent clone detection applied to plagiarism de- tection[C]//The 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM), Timisoara, 2010 : 77-86.
3Dang Y, Ge S, Huang R, et al. Code clone detec- tion experience at Microsoft[C]//Proceedings of the 5th International Workshop on Software Clones, ACM, 2011: 63-64.
4Zibran M F, Roy C K. IDE-based real-time focused search for near-miss clones[C]///Proceedings of the 27th Annual ACM Symposium on Applied Compu- ting, ACM, 2012: 1235-1242.
5Higo Y, Kamiya T, Kusumoto S, et al. Method and implementation for investigating code clones in a software system [J]. Information and Software Technology, 2007, 49(9): 985-998.
6邓爱萍.程序代码相似度度量算法研究[J].计算机工程与设计,2008,29(17):4636-4638. 被引量：24
7古平,张锋,周海涛.一种程序源代码相似度度量方法[J].计算机工程,2012,38(6):37-39. 被引量：7
8张丽萍,刘东升,李彦臣,钟美.一种基于AST的代码抄袭检测方法[J].计算机应用研究,2011,28(12):4616-4620. 被引量：8
9Schleimer S, Wilkerson D S, Aiken A. Winnowing: local algorithms for document fingerprinting[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, 2003: 76-85.
10Wang J, Han J. BIDE: efficient mining of frequent closed sequences[C]//IEEE 20th International Con- ference on Data Engineering, 2004: 79-90.

二级参考文献36

1程金宏,刘东升.程序代码相似度自动度量技术研究综述[J].内蒙古师范大学学报（自然科学汉文版）,2006,35(4):457-461. 被引量：13
2GEORGINA C, MIKE J. Source-code plagiarism:a UK academic per- spective, RR- 422 [ R ]. Coventry, England: Department of Computer Science, University of Warwick,2006.
3SHEARD J, DICK M, MARKHAM S, et al. Cheating and plagiarism : perceptions and practices of first year IT students [ C ]//Prec of the 7th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education. New York : ACM Press, 2002 : 183-187.
4JONES E L. Metrics based plagiarism monitoring [ C ]/:/ Proc of the 6th Annual CCSC Northeastern C6nference on the Journal of Compu- ting in Small Colleges. [ S. l. ] : Consortium for Computing Sciences in Colleges,2001:253-261.
5HALSTEAD M H. Elements of software science [ M ]. New York: Elsevier Science Inc, 19777.
6GRANVILLE A. Detecting plagiarism in Java code [ D]. Sheffield:U- niversity of Sheffield, 2002.
7CLOUGH P. Plagiarism in natural and programming languages : an o- verview of current tools and technologies, research memoranda CS-00- 05 [ R]. Sheffield : University of Sheffield, 2000.
8WISE M J. YAP5 :improvement detection of similarities in computer program and other texts [ C ]//Proc of the 27th SIGCSE Technical Symposium on Computer Science Education. New York : ACM Press, 1996 : 130-134.
9PRECHELT L, MALPOHL G, PHILIPPSEN M. Finding plagiarisms among a set of programs with JPlag[ J]. Journal of Universal Com- puter Science, 2002,8( 11 ) :1016-1038.
10AIKEN A. Moss: a system for detecting software plagiarism [ EB/ OL]. (2011-04-29). http://theory, stanford, edu/-aiken/moss/.

共引文献33

1包敬海.数据挖掘在操作类作业剽窃检测中的应用[J].科协论坛（下半月）,2009(12):174-174.
2张莉,周祖林.代码相似性检测在程序设计教学中的应用[J].计算机教育,2009(13):116-118. 被引量：8
3沈盈洪,丰翔龙,黄荣游.基于网页聚类的搜索结果优化算法研究[J].计算机应用,2010,30(A01):51-53. 被引量：3
4周汉平.Levenshtein距离在编程题自动评阅中的应用研究[J].计算机应用与软件,2011,28(5):209-212. 被引量：7
5李旭东.计算机程序抄袭检测系统的设计方案[J].电脑知识与技术,2012,8(2):799-800. 被引量：4
6牛永洁,张成.多种字符串相似度算法的比较研究[J].计算机与数字工程,2012,40(3):14-17. 被引量：37
7牛永洁.RKR-GST算法在.NET中的分析与实现[J].信息技术,2012,36(3):171-174. 被引量：3
8张丽萍,刘呈龙,刘东升.基于AST的多语言代码抄袭检测方法研究[J].内蒙古师范大学学报（自然科学汉文版）,2012,41(4):385-392. 被引量：3
9于世英,袁雪梅,卢海涛,任家东,李硕.基于序列聚类的相似代码检测算法[J].智能系统学报,2013,8(1):52-57. 被引量：5
10谷春英,张顺利.改进指纹和LSC加权的恶意程序代码相似度估计算法[J].科学技术与工程,2013,21(10):2871-2874. 被引量：1

同被引文献2

1李立波,白树仁,陈磊,张威.基于不确定数据的可能频繁闭序列模式挖掘[J].计算机应用研究,2016,33(4):983-988. 被引量：7
2张洪泽,洪征,王辰,冯文博,吴礼发.基于闭合序列模式挖掘的未知协议格式推断方法[J].计算机科学,2019,46(6):80-89. 被引量：4

引证文献1

1刘延华,刘志煌.一种基于用户行为模式的匿名数据鉴定方法[J].信息网络安全,2021(3):44-52. 被引量：1

二级引证文献1

1李浩然,王增宝.“放管服”改革背景下政务APP个人信息保护问题研究[J].信息网络安全,2021(S01):45-49.

1寇晨艳.一种基于排序的基因表达数据频繁闭合模式挖掘算法[J].电脑与信息技术,2014,22(3):7-10.
2石怀东,蔡铭,吴洪森,董金祥,富浩.增量式频繁闭合序列挖掘算法[J].浙江大学学报（工学版）,2009,43(8):1389-1395. 被引量：1
3缪裕青,尹东.分布式存储结构的频繁闭合模式挖掘并行算法[J].微电子学与计算机,2007,24(10):161-163. 被引量：3
4杨君锐,张敏,何洪德.基于分布式的频繁闭合模式挖掘算法[J].西南交通大学学报,2012,47(6):1027-1033.
5王亮,汪梅,郭鑫颖,秦学斌.面向移动时空轨迹数据的频繁闭合模式挖掘[J].西安科技大学学报,2016,36(4):573-576. 被引量：5
6陆楠,李晓林.基于动态窗口的数据流频繁闭合模式挖掘算法[J].信息与电脑（理论版）,2009(10):100-102.
7程转流,胡为成,胡学钢.基于DSFCI-tree的分布式数据流频繁闭合模式挖掘[J].微电子学与计算机,2007,24(9):120-122. 被引量：2
8计算机科学技术其他学科[J].中国学术期刊文摘,2008,14(2):229-230.
9缪裕青,陈国良,徐云.基因表达数据的频繁闭合模式挖掘新算法[J].中国科学技术大学学报,2007,37(9):1080-1087. 被引量：1
10胡为成,王本年,程转流.基于DSCFCI_tree的带项目约束的数据流频繁闭合模式挖掘算法[J].中国科学技术大学学报,2009,39(11):1194-1201. 被引量：2

吉林大学学报（工学版）

2015年第4期

浏览历史

内容加载中请稍等...

基于频繁闭合序列模式挖掘的学生程序雷同检测被引量：1

参考文献10

二级参考文献36

共引文献33

同被引文献2

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于频繁闭合序列模式挖掘的学生程序雷同检测 被引量：1

参考文献10

二级参考文献36

共引文献33

同被引文献2

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于频繁闭合序列模式挖掘的学生程序雷同检测被引量：1