摘要
目前的搜索引擎越来越暴露出不足之处 ,当用户使用搜索引擎时输入特定关键词之后 ,返回的查询结果往往有数千甚至几百万之多 ,而且其中包含大量的重复信息与垃圾信息 ,用户从中筛选出自己感兴趣的网页仍然需要耗费很长的时间。另外一种情况就是 ,Web上明明存在某些重要网页 ,却没有被搜索引擎的robot发现。本文针对这种现象 ,重点讨论搜索引擎中的搜索策略 ,改善搜索算法 ,使Robot在搜索阶段就能够充分处理与Robot频繁交互的URL列表。根据网页的内容、HTML结构以及其中包含的超链信息计算网页的PageRank ,使URL列表能够根据重要性调整排列顺序。初步的试验结果表明 。
With the explosive growth of the WWW,search engine is becoming more and more important.A large amount of users are relying on search engine for interesting information.But now,after the user inputting the query,such search engines often result in a huge set of retrieved documents,many of which are irrelevant to the user.It is very difficult to sifting the specific document.On the other hand,robots cannot retrieve some important homepages.In this paper we present a search algorithm that based on processing the queue of the URL efficiently.According to the content of the papge,the HTML structure of the page and the hyperlinks among these pages,we evaluate the importance of these homepages.So the robot can adjust the order of our URL list.Preliminary experiments show significant improvements over the original search algorithm.
出处
《情报学报》
CSSCI
北大核心
2002年第2期130-133,共4页
Journal of the China Society for Scientific and Technical Information