学术论文复制检测的研究进展及新方法被引量：1

Review and New Ideas on Duplication Detection of Articles

导出

摘要综述国内外学术论文复制检测的研究现状,针对存在的问题提出以后研究的新思路:构建某一学科领域学术论文语料库;以信息论为工具,针对某学科领域建立基于学术论文语料库的统计语言模型;结合学术论文抄袭剽窃的特点,通过赋予描述资源对象语义信息的不同元数据项以不同的权函数,设计相似度算法;使用Lemur工具箱,在标准的TREC文档集上对模型和算法进行检验;与Turnitin侦探剽窃系统进行实验对比,评价该模型和算法的有效率和效果。 After reviewing and analyzing the problems of retrieval models and text similarity algorithms of duplication detection, the anthor proposes some new ideas on plagiarism detection of articles to improve the recall and precision. The ideas include the followings ： building article training corpus in one specialty;based on information theory, building statistical language model;computing articles similarity by different metadata with different authorized functions ; using Lemur toolbox to test recall and precision of the model and similarity algorithm ; comparing with Turnitin plagiarism detection system to evaluate the effectiveness and efficiency of the detection computation.

作者王秀红

机构地区江苏大学科技信息研究所

出处《图书情报工作》 CSSCI 北大核心 2009年第5期111-114,共4页 Library and Information Service

基金江苏大学博士生创新基金项目"学术论文抄袭检测模型及算法"(项目编号:CX08B-18X)研究成果之一

关键词学术论文复制检测抄袭剽窃检测统计语言模型文本相似度算法 articles duplication detection plagiarism detection statistical language model text similarity algorithm

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1Manber U. Finding similar files in a large file system//Rose G. Proceedings of the USENIX Winter Conference. 1994 : 1 - 10.
2Brin S, Davis J, Garcia - Molina H. Copy detection mechanisms for digital documents//Carey M, Schneider D, Systems R B. Proceedings of the ACM SIGMOD Annual Conference. New York: ACM, 1995 : 398 -409.
3Shivakumar N, Garcia - Molina H. SCAM : A copy detection mechanism for digital documents//Proceedings of the 2nd International Conference on Theory and Practice of Digital Libraries. Austin:Texas, 1995:1 -13.
4Shivakumar N, Garcia- Molina H. Building a scalable and accurate copy detection mechanism.//Fox E A, Tech V, Marchionini B a. Proceedings of the first ACM international conference on Digital libraries. New York: ACM, 1996:160-168.
5Shivakumar N, Gareia - Molina H. Finding near - replicas of documents on the web//Atzeni P, Mendelzon A, Mecca G. Proceedings of the Workshop on Web Databases Held in Conjunction with EDBT' 98. LNCS. Berlin : Springer, 1999:204 - 212.
6Prechelt L, Malpohl G, Philippsen M. Finding plagiarism among a set of programs with Jplag. Journal of Universal Computer Science, 2002,8 ( 11 ) : 1016 - 1038.
7Si A, Leong H V, Lau RWH. CHECK: A document plagiarism detection system//Bryant B, Carroll J, Hightower J, et al. Proceedings of the ACM Symposium for Applied Computing. 1997:70 -77.
8Stein B. Fuzzy-Fingerprints for text-based information retrieval// Tochtermann K, Maurer H. Proceedings I -KNOW 05, Graz, J. UCS, 2005:572 - 579.
9Stein B, MeyerzuEissen S. Near similarity search and plagiarism analysis//Weihs C, Gaul W. Proceeding of 29th Annual Conference of the GfKI. Berlin: Springer, 2006:430 -437.
10MeyerzuEissen S, Stein B, Kulig M. Plagiarism detection without reference collections. Advances in Data Analysis//Decher R, Lenz H J. Proceedings of the 30^th Annual Conference of the Gesellschaft fur Klassifikation e.V.. Freie University. Berlin: Springer, 2006: 359 - 366.

二级参考文献11

1史彦军,滕弘飞,金博.抄袭论文识别研究与进展[J].大连理工大学学报,2005,45(1):50-57. 被引量：36
2AUSTIN R.Word check system[EB/OL].[2002-12-02] http:∥www.wordchecksystems.com
3ANTONIO S,LEONG H V,RYNSON W H.CHECK:a document plagiarism detection system[C]∥ Proceedings of ACM Symposium for Applied Computing.San Jose:[s n],1997:70-77.
4HEINTZE N.Scalable document fingerprinting (extended abstract)[C]∥ Proceedings of USENIX Workshop on Electronic Commerce.Oakland:[s n],1996:69-74
5UDI M.Finding similar files in a large file system[C]∥ 1994 Winter USENIX Technical Conference.San Francisco:[s n],1994:1-10
6SALTON G,SALTON C.Term-weighting approaches in automatic text retrieval[J].Inf Process and Manage,1988,24:513-523
7ZHANG Hua-ping.HHMM-based Chinese lexical analyzer ICTCLAS[C]∥ Second SIGHAN Workshop Affiliated with 41st ACL.Sapporo:[s n],2003:63-70
8张益民,陆汝占,沈李斌.一种混合型的汉语篇章结构自动分析方法[J].软件学报,2000,11(11):1527-1533. 被引量：10
9宋擒豹,沈钧毅.数字商品非法复制和扩散的监测机制[J].计算机研究与发展,2001,38(1):121-125. 被引量：38
10王继成,武港山,周源远,张福炎.一种篇章结构指导的中文Web文档自动摘要方法[J].计算机研究与发展,2003,40(3):398-405. 被引量：43

共引文献27

1易彤,徐升华,万常选,吴方君.抄袭剽窃论文识别研究综述[J].情报学报,2007,26(4):567-573. 被引量：7
2王涛,樊孝忠,林培光,陈康.基于复杂特征集的剽窃检测[J].北京理工大学学报,2008,28(2):129-133. 被引量：2
3赵俊杰,谢飞.基于段落相似度的论文抄袭判定[J].电脑与电信,2008(8):22-23. 被引量：2
4李旭,赵亚伟,刘国华.基于指纹和语义特征的文档复制检测方法[J].燕山大学学报,2008,32(4):334-339. 被引量：5
5聂规划,付志超,陈冬林,刘平峰.基于本体的论文复制检测系统[J].计算机工程,2009,35(6):79-81. 被引量：9
6赵俊杰,胡学钢.一种基于段落词频统计的论文抄袭判定算法[J].计算机技术与发展,2009,19(4):231-233. 被引量：12
7赵俊杰.学术论文抄袭检测方法研究综述[J].湖南工业大学学报（社会科学版）,2010,15(1):157-159. 被引量：3
8王建国,杨焕海.基于篇章结构相似度的中文学术论文复制检测技术研究[J].现代计算机,2010,16(6):20-23.
9陈路瑶,曾国荪,王伟.信息文档结构信任模式的提取及逻辑描述[J].计算机应用研究,2010,27(12):4624-4629. 被引量：2
10王巍,于海,王志飞.计算机技术在反学术论文抄袭中的应用概述[J].辽宁师专学报（自然科学版）,2010,12(3):106-108.

同被引文献2

1杜新征,余茜,王芹.学术不端文献检测系统使用体会[J].黄冈师范学院学报,2011,31(3):167-168. 被引量：6
2石鹤,明桥,夏黎明,汪晓,汪玲,杨岷.对正确使用学术不端文献检测系统的探讨[J].湖北第二师范学院学报,2011,28(8):80-82. 被引量：9

引证文献1

1马辉洪.学习诚信与不实──谈文献检测系统及其应用[J].福建图书馆理论与实践,2012(4):23-25.

1李连,朱爱红,苏涛.一种改进的基于向量空间文本相似度算法的研究与实现[J].计算机应用与软件,2012,29(2):282-284. 被引量：35
2袁晓峰.一种基于HNC理论的文本相似度算法[J].计算机时代,2014(11):40-41.
3贾惠娟.一种改进的文本相似度算法在政务系统中的应用[J].信息技术与信息化,2016(7):49-52. 被引量：3
4黄贤英,刘英涛,饶勤菲.一种基于公共词块的英文短文本相似度算法[J].重庆理工大学学报（自然科学）,2015,29(8):88-93. 被引量：7
5郐媛媛.基于语义的文本相似度算法研究[J].计算机光盘软件与应用,2014,17(9):302-303. 被引量：2
6张佩云,陈传明,黄波.基于子树匹配的文本相似度算法[J].模式识别与人工智能,2014,27(3):226-234. 被引量：13
7黄贤英,李沁东,刘英涛.结合词性的短文本相似度算法及其在文本分类中的应用[J].电讯技术,2017,57(1):78-82. 被引量：11
8周丽杰,于伟海,郭成.基于改进的TF-IDF方法的文本相似度算法研究[J].泰山学院学报,2015,37(3):18-22. 被引量：10
9金博,史彦军,滕弘飞.基于语义理解的文本相似度算法[J].大连理工大学学报,2005,45(2):291-297. 被引量：80
10王彪,高光来.界模型信息检索及其参数优化[J].计算机工程与应用,2012,48(1):153-156.

图书情报工作

2009年第5期

浏览历史

内容加载中请稍等...

学术论文复制检测的研究进展及新方法被引量：1

参考文献15

二级参考文献11

共引文献27

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

学术论文复制检测的研究进展及新方法 被引量：1

参考文献15

二级参考文献11

共引文献27

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

学术论文复制检测的研究进展及新方法被引量：1