期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
On-line topical importance estimation:an effective focused crawling algorithm combining link and content analysis 被引量:6
1
作者 Can WANG Zi-yu GUAN +3 位作者 Chun CHEN Jia-jun BU Jun-feng WANG Huai-zhong LIN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第8期1114-1124,共11页
Focused crawling is an important technique for topical resource discovery on the Web.The key issue in focused crawling is to prioritize uncrawled uniform resource locators(URLs) in the frontier to focus the crawling o... Focused crawling is an important technique for topical resource discovery on the Web.The key issue in focused crawling is to prioritize uncrawled uniform resource locators(URLs) in the frontier to focus the crawling on relevant pages.Traditional focused crawlers mainly rely on content analysis.Link-based techniques are not effectively exploited despite their usefulness.In this paper,we propose a new frontier prioritizing algorithm,namely the on-line topical importance estimation(OTIE) algorithm.OTIE combines link-and content-based analysis to evaluate the priority of an uncrawled URL in the frontier.We performed real crawling experiments over 30 topics selected from the Open Directory Project(ODP) and compared harvest rate and target recall of the four crawling algorithms:breadth-first,link-context-prediction,on-line page importance computation(OPIC) and our OTIE.Experimental results showed that OTIE significantly outperforms the other three algorithms on the average target recall while maintaining an acceptable harvest rate.Moreover,OTIE is much faster than the traditional focused crawling algorithm. 展开更多
关键词 Focused crawlers topical crawlers PAGERANK Classifiers On-line topical importance estimation (OTIE) algorithm
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部