期刊文献+

主题搜索ROBOT综合爬行策略的研究 被引量:6

Research on Integrative Crawling Strategy of Subject Searching ROBOT
下载PDF
导出
摘要 在分析、评价常用主题搜索ROBOT爬行策略的基础上,将三重过滤技术与改进的Shark启发式搜索算法相结合,设计了自动主题搜索引擎ROBOT的综合爬行策略。由于综合爬行策略在爬行中兼顾了网页的相关性、主题精度和网页质量,因此应用综合爬行策略在Web上下载主题相关网页时,既可利用链接分析扩大某个主题的资源覆盖度,又可保证搜索结果与主题高度相关。 Based on analyzing and evaluating on strategies of subject searching ROBOT in common use, an auto integrative crawling strategy of subject searching ROBOT was designed by combing treble filtrating technique with modified heuristic searching arithmetic of Shark in this paper. For considering web relativity, subject precision and web quality at the same time, when using the integrative crawling strategy to download correlative web, it was possible to enlarge the resource degree of coverage through link analyzing as well as to ensure the searching re.suits high correlate to the subject.
出处 《武汉理工大学学报》 EI CAS CSCD 北大核心 2006年第2期74-76,共3页 Journal of Wuhan University of Technology
基金 湖北省自然科学基金(2004ABA061)
关键词 主题搜索引擎 网络爬虫 综合爬行策略 subject search engine web spider integrative crawling strategy
  • 相关文献

参考文献5

二级参考文献33

  • 1Page L, Brin S, Motwani R, Winograd T. The PageRank Citation Ranking : Bringing Order to the WEB. Jan 1998 and July 2001 at http://www. db. stanford. edu/-backub/PageRanksub. ps.
  • 2Brin S,Page L. The anatomy of a large-scale hypertextual WEB search engine, In: Proc of the Seventh Intl World Wide WEB Conf. 1998.
  • 3Richardson M,Domingos P. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank, volume 14. MIT Press, Cambridge, MA, 2002.
  • 4Haveliwala T H. Topic-Sensitive PageRank. In:Proc of the Eleventh Intl World Wide WEB Conf. 2002.
  • 5Kleinberg J. Authoritative sources in a hyperlinked environmerit. In.. Proc 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.
  • 6Chakrabarti S,et al. Hypersearching the WEB. Scientific American. June 1999.
  • 7Henzinger M R,Bharat K. Improved algorithms for topic distillation in a hyperlinked environment. In:Proc of the 21'st Intl ACMSIGIR Conf on Research and Development in IR, Aug. 1998.
  • 8Lempel R,Moran S. The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect. In:Porc 9 th Intl WorldWide WEB Conf. 2000.
  • 9Chakrabarti S, et al. Mining the WEB's link structure. IEEE Computer, Aug. 1999.
  • 10Chakrabarti S,et al. Automatic resource compilation by analyzing hyperlink structure and associated text. In:Proc 7th Intl WWW Conf. 1998.

共引文献160

同被引文献33

引证文献6

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部