摘要
互联网的迅猛发展导致网络中的网页呈指数级别爆炸式增长。为解决在海量网页中寻找信息的问题,搜索引擎成为了人们使用互联网的重要工具。提出了一种基于净化网页的改进消重算法,并将它与传统的消重算法进行了比较。该算法结合关键字搜索和签名(计算指纹)搜索各自的优势来完成网页搜索消重。实验结果证明该方法对网页消重效果很好,提高了网页消重的查全率和查准率。
The internet's development led to the rapid development on the explosive exponential growth level. To look for useful information, search engines have become one of the most important network tools. This paper presents an improved algorithm that is based on purified webpage and compared with the conventional algorithms. The algorithm combines the advantages of keyword search method and signature (calculated fingerprint) search method for the removal of duplicate pages. The experiments results certify that the algorithm improve the recall and precision.
出处
《计算机系统应用》
2011年第12期197-199,共3页
Computer Systems & Applications
关键词
网页消重
净化网页
关键字
签名
duplicate webpage
elimination algorithm
Webpage purification
keywords
fingerprint