期刊文献+

搜索引擎中Robot搜索算法的优化 被引量:21

Improvement of the Robot Search Algorithm
下载PDF
导出
摘要 目前的搜索引擎越来越暴露出不足之处 ,当用户使用搜索引擎时输入特定关键词之后 ,返回的查询结果往往有数千甚至几百万之多 ,而且其中包含大量的重复信息与垃圾信息 ,用户从中筛选出自己感兴趣的网页仍然需要耗费很长的时间。另外一种情况就是 ,Web上明明存在某些重要网页 ,却没有被搜索引擎的robot发现。本文针对这种现象 ,重点讨论搜索引擎中的搜索策略 ,改善搜索算法 ,使Robot在搜索阶段就能够充分处理与Robot频繁交互的URL列表。根据网页的内容、HTML结构以及其中包含的超链信息计算网页的PageRank ,使URL列表能够根据重要性调整排列顺序。初步的试验结果表明 。 With the explosive growth of the WWW,search engine is becoming more and more important.A large amount of users are relying on search engine for interesting information.But now,after the user inputting the query,such search engines often result in a huge set of retrieved documents,many of which are irrelevant to the user.It is very difficult to sifting the specific document.On the other hand,robots cannot retrieve some important homepages.In this paper we present a search algorithm that based on processing the queue of the URL efficiently.According to the content of the papge,the HTML structure of the page and the hyperlinks among these pages,we evaluate the importance of these homepages.So the robot can adjust the order of our URL list.Preliminary experiments show significant improvements over the original search algorithm.
出处 《情报学报》 CSSCI 北大核心 2002年第2期130-133,共4页 Journal of the China Society for Scientific and Technical Information
关键词 搜索引擎 超链接 ROBOT PAGERANK 搜索策略 搜索模块 搜索算法 优化算法 search engine,hyperlink,Robot,PageRank.
  • 相关文献

参考文献15

  • 1[1]Mark A.C.Overmeer.My personal search engine.Computer Networks,1999,31:2271~2279
  • 2[2]S.Lawrence,C.Lee Giles.Accessibility of information on the Web.Nature,1999,400
  • 3[3]M.Koster.Robots in the web:threat or treat.Conne Xions,1995,9(4) http://info.webcrawler.com/mak/projects/robots/threat-or-treat.html
  • 4[4]Krishan Bharat,Andrei Broder,Monika Henzinger,etc..The connectivity derver:fast access to linkage information on the web.Proc.7th International World Wide Web Conference,1998
  • 5[5]Soumen Chakrabarti.Mining the Web's link structure.Computer,IEEE,1999,August:60~67
  • 6[6]Altigran S.Da Silva,Eveline A.Veloso,Paulo B.Golgher,etc..CoBWeb--A crawler for the Brazilian Web.String Processing and Information Retrieval Symposium,1999:184~191
  • 7[7]C.M.Bowman,P.B.Danzig,D.R.Hardy,U.Manber,and M.F.Schwartz.Harvest:a scalable,customizable discovery and access system.Technical Report CU-CS-732-94,1994
  • 8[8]H.Yamana,K.Tamur,H.Kawano,S.Kamei,M.Harada,etc.Experiments of collecting www information using distributed www robots.In Proceedings of the 21st International ACM SIGIR Conference,Australian,1998
  • 9[9]Y.S.Maarek,et al.WebCutter:a system for dynamic and tailorable site mapping.Proc.of 6th WWW Conference,Santa Clara,USA,April,1997
  • 10[10]Gun-Woo Nam,Jong-Hee Park,Tai-Yun Kim.Dynamic management of URL based on object-oriented paradigm.Parallel and Distributed Systems,IEEE,1998:226~230

同被引文献144

引证文献21

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部