摘要
抄袭剽窃论文的识别是知识产权保护中一项重要的内容,已有众多的识别方法和系统.本文从抄袭剽窃的定义、文本的表示(向量空间模型、广义向量空间模型、隐性语义索引模型)、文本相似度的研究内容、文本相似度的计算方法(基于统计学的计算方法和基于语义理解的计算方法)、数字指纹和词频统计两大类技术和方法和抄袭剽窃识别系统等方面为基本思路,对该领域中已提出的主要研究方案进行了分类阐述和比较分析,总结了其最新研究进展,为下一步的研究提出了新的课题和设想.
Copy and plagiarism detection is emerging as one of the primary research areas in intellectual property protection. Many plagiarism detection methods and systems have been proposed. The paper summaries this research field from some points of view, such as the definition of copy and plagiarism, text representation( such as, Vector Space Model, Generalized Vector Space Model, Latent Semantic Index), research content of text similarity, computation method of text similarity( such as one computation method based on statistics, another computation method based on semantic comprehension), the two main techniques and methods, namely, finger printing and word frequency, and detection systems. At the end of the paper, some difficulties have to overcome in the future are pointed out, and directions to study are given.
出处
《情报学报》
CSSCI
北大核心
2007年第4期567-573,共7页
Journal of the China Society for Scientific and Technical Information
基金
江西省自然科学基金项目(程序切片技术在软件形式化中的应用)、江西省教育科学"十一五"规划重点课题(江西高校科研竞争力评价体系的研究)、江西省社会科学"十一五"规划课题(学校内部科研成果创新性评价及对应管理体制改革研究)和江西财经大学校级课题(程序切片技术在软件形式化
关键词
剽窃检测
数字指纹
词频统计
plagiarism detection, finger printing, word frequency