期刊文献+

基于粒子群优化的文档子内容查重算法

Duplicate Checking Algorithm of Document Partial Content Based on Particle Swarm Optimization
下载PDF
导出
摘要 现存的文档相似性算法虽然能够获得2篇文档的相似度,但不能判断出重复或最相似子内容的位置。为此,提出一种基于粒子群优化(PSO)的文档内部子内容的查重算法。利用PSO方法查找2篇文档中最佳相似子内容的位置和长度,设计一种相关函数来判断字符串之间的相似程度,从而得到粒子群的评估函数。测试表明,该查重算法能够快速准确地确定出重复或最相似子内容的位置与长度。 There are some algorithms which can detect similarity among documents,but these algorithms can not detect the duplicated of partial contents in documents.A new effective algorithm of the duplicated of partial contents detection in documents is put forward in this paper.It uses Particle Swarm Optimization(PSO) algorithm to search the optimized partial contents which is the most similar in two documents.For PSO algorithm,it provides the encoding of the particles.A new related coefficient of strings is defined for strings similarity.And the new evaluation function of PSO is designed based on the related coefficient function.The hybrid mutation PSO algorithm is used for searching the most similar partial contents quickly and accurately.Simulation experiments indicate that the algorithm can search the most similar partial contents in two documents effectively.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第20期203-205,共3页 Computer Engineering
基金 浙江省教育厅基金资助项目(Y200908502)
关键词 查重 相似度函数 粒子群优化 评估函数 字符串 duplicate checking similarity function Particle Swarm Optimization(PSO) evaluation function character string
  • 相关文献

参考文献12

二级参考文献67

共引文献160

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部