摘要
针对学生程序抄袭导致考核可信度降低而人工检测抄袭工作量巨大的问题,提出了程序雷同检测模型,首先通过词法分析将程序转换成token序列,并将其散列映射为数字序列;然后采用BIDE挖掘算法挖掘频繁闭合序列;在此基础上,识别相似代码片段,并计算程序之间的相似度,进而判定程序是否雷同。实验结果表明,与目前应用广泛的雷同程序检测工具MOSS相比,本文方法提高了雷同检测的准确性,不但可以准确地给出雷同统计信息,还能够较为直观地显示雷同代码片段。
Plagiarism in student programs is a common phenomenon, which decreases the credibility of assessment. However, manual detection loads a heavy burden on the teachers. To solve this problem, a plagiarism detection model is proposed. First, student programs are converted into token sequences through lexical analysis. Then, the token sequences are hashed to digital sequences. Then, the frequent closed sequences are mined by the BIDE algorithm. On this basis, the similar code fragments are detected and the plagiarism programs are identified by the calculated similarity. Experimental results show that, compared with the commonly used toll MOSS, the proposed method is more precise. It can not only give accurate statistical information of similar programs, but also explicitly display the plagiarized code fragments.
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2015年第4期1260-1265,共6页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(61202092
61173021)
高等学校博士学科点专项科研基金项目(20112302120052)
哈尔滨科技创新人才专项项目(RC2013QN010001)
黑龙江省高教学会'十二五'重点规划课题项目(HGJXHB1110957)
黑龙江省普通高校青年学术骨干项目(1254G037)
关键词
计算机软件
抄袭检测
频繁闭合序列模式
相似度
雷同代码
computer software
plagiarism detection
frequent closed sequence mining
similarity
similar code