期刊文献+

Rabin指纹去重算法在搜索引擎中的应用 被引量:1

Application of Duplication Removal Method of Rabin Fingerprint in Search Engine
下载PDF
导出
摘要 针对搜索引擎在海量数据中搜索速度慢,占用存储空间大,对重复的网页去重性差的现状,提出一种基于Rabin指纹算法的去重方法,不仅对搜索到的URL地址进行去重,还对非重复URL地址对应的网页内容进行相似和相同的去重,试验表明能有效地提高搜索速度、节省存储空间,增强搜索的精度. The existing search engine of massive data takes up large memory, needs much time and provides results of great duplication rate. To overcome these disadvantages, this paper proposes a duplication removal method based on the Rabin Fingerprint method, which cannot only remove the duplicated URL, but also remove the same even similar website content on different URL so that it can speed up the searching speed, save the memory capability and improve the accuracy of the research.
作者 贺建英
出处 《计算机系统应用》 2015年第7期128-131,共4页 Computer Systems & Applications
基金 国家档案局项目(2014-X-65)
关键词 Rabin指纹方法 搜索引擎 去重 URL 海量数据 Rabin fingerprinting method search engine duplicate removal URL massive data
  • 相关文献

参考文献5

  • 1BRODER A Z.Some applications of Rabin s fingerprinting method. SequencesII:Methods in communications,security,and computer science . 1993
  • 2Manber U.Finding similar files in a large file system. Proceedings of USENIX Winter Technical Conference . 1994
  • 3Broder A Z.On the resemblance and containment of documents. Compression and Complexity of Sequences’’97 . 1997
  • 4Andrei Broder,Michael Mitzenmacher.Network Applications of Bloom Filters: A Survey. Internet Mathematics . 2004
  • 5G. Forman,K. Eshghi,S. Chiocchet.Finding Similar Files in Large Document Repositories. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 2005

共引文献1

同被引文献2

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部