期刊文献+

用有向图法解决网页爬行中循环链接问题 被引量:7

Solve the cycle links problem in Internet crawlingby directed graph
下载PDF
导出
摘要 提出网页构成的有向回路问题,描述了由网页构成有向图的形式定义,并给出了用有向图法发现网页构成的有向回路算法.所给定的算法能使网页爬行器避免掉入由已爬行过的网页构成的有向回路陷阱. The present paper deals with the technique how to solve the problem of cycle links in internet crawling by directed graph. First, the problem is proposed. Then, the formal definition of cycle links in internet crawling is described. Finally, the algorithm to solve the problem by directed graph is given. The key problem to a crawler is how to find directed loops effectively in web pages crawled by the crawler. The algorithm described in this paper can make the crawler avoid dropping in the pitfall created by cycle links.
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2004年第3期402-404,共3页 Journal of Jilin University:Science Edition
基金 国家自然科学基金(批准号:60373099).
关键词 爬行器 网络搜索引擎 超链接 有向图 crawler internet search engine hyperlink directed graph
  • 相关文献

参考文献3

  • 1[1]Aggarwal C C,Al-Garawi F,Yu P S.Intelligent crawling on the world wide web with arbitrary predicates [C].Proceedings of the 10th International World Wide Web Conference.Hong Kong:ACM,2001:96-105.
  • 2[2]Bharat K,Broder A,Henzinger M,et al.The connectivity server:fast access to linkage information on the Web [C].Proceedings of the 7th International World Wide Web Conference.Brisbane,Australia:Elsevier Science,1998:469-477.
  • 3[3]Heydon A,Najork M.Mercator:a scalable,extensible Web crawler [J].World Wide Web,1999,2(4):219-229.

同被引文献22

引证文献7

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部