摘要
提出一种基于句子相似度的文档复制检测技术,在抓住文档的全局特征的同时又兼顾文档的结构信息,克服以往检测算法两者不可兼顾的缺陷,提高检测精度。最后,给出该算法与其他算法检测结果的比较情况。实验证明,该算法是可行的。
In the paper, a new document copy detection algorithm based on the similarity of the sentences is proposed. In order to improve the detection accuracy, the authors not only emphasize on the whole document, but also on the structure of the document. In the end, experiments and comparison are taken between the new algorithm and other typical algorithms, the result shows that it is feasible.
出处
《现代图书情报技术》
CSSCI
北大核心
2007年第11期63-66,共4页
New Technology of Library and Information Service
关键词
文档复制检测
句子相似度
指纹
Document copy detection Sentence similarity Fingerprints