期刊文献+

融合链接结构的主题爬虫算法 被引量:4

Topic Crawler Algorithm With Link Structure
下载PDF
导出
摘要 通过分析基于内容的链接选择Best-First算法,引入能够体现链接价值的HITS(hyperlink induced topic search)算法,提出了新的链接选择策略.将两种算法相结合,新的爬虫不仅仅考虑页面内容,同时将链接结构加入进来,使得在下载的过程中能够保证主题相关性和权威性,缓解爬虫在爬行阶段的"近视"现象.结果表明:新的爬行策略比单一的Best-First算法具有更好的性能表现. By analyzing the content-based link selection Best-First algorithm, and introduce the HITS (hyper-link induced topic search) algorithm which can reflect the link value, a new kind of link selection strategy is proposed: Combination of two algorithms, new crawler not only consider the page content, but also the link structure,and can ensure topic relevance and authority in the process of downloading; at the same time, ease the “short-siglited” phenomenon in crawling stage. Experimental result shows the new crawling strategy has better performance than that of the single Best-First algorithm.
出处 《华侨大学学报(自然科学版)》 CAS 北大核心 2017年第2期195-200,共6页 Journal of Huaqiao University(Natural Science)
基金 福建省科技厅科研基金资助项目(2011H6016)
关键词 Best-First算法 链接结构 HITS算法 爬行策略 Best-First algorithm linkstru cture HITS algorithm crawling strategy
  • 相关文献

参考文献3

二级参考文献31

  • 1欧阳柳波,李学勇,李国徽,王鑫.专业搜索引擎搜索策略综述[J].计算机工程,2004,30(13):32-33. 被引量:34
  • 2赫枫龄,左万利.利用超链接信息改进网页爬行器的搜索策略[J].吉林大学学报(信息科学版),2005,23(1):59-63. 被引量:8
  • 3CCNIC.第25次中国互联网络发展状况统计报告[EB/OL]. 2010. http://www.cnnic.cn/uploadfiles/pdf/2010/1/ 15/101600. pdf. CI2NIC.
  • 4Panidis A, Poulos G K C, Pitas I. Combining Text and Link Analysis for Focused Crawling - an Application for Vertical Search Engines[J]. Information System,2007,32(6) :886 -908.
  • 5Menczer F,Pant G,Srinivasan P. Topical web crawlers: evaluating adaptive algorithms[J]. ACM Transactions on Internet Technology,2004,4(4) :378 - 419.
  • 6Menczer F, Pant G. Evaluating Topic - Driven Web Crawlers[ C]//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: [s. n. ] ,2001:9 - 12.
  • 7Bra D P, Post R. Searching for arbitrary information in the WWW: the fish - search for mosaic [ C ]//Second WWW Conference. Chicago: ACM Press, 1994: 45 - 51.
  • 8Herseovici M,Jacov M,SMaarek Y. The Shark- Search Algorithm- An Application:Tailored Web Site Mapping[J]. Computer Networks and ISDN Systems, 1998,30 : 317 - 326.
  • 9Page L,Brin S,Motwani R. The PageRank Citation Ranking: Bring Order to the Web[ R]. Stanford, CA. Stanford University, 1998.
  • 10Kleinberg J. Authoritative Sources in A Hyperlinked Environment[J] .Journal of the ACM,1999,46(5) :604 - 632.

共引文献30

同被引文献42

引证文献4

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部