期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A Survey about Algorithms Utilized by Focused Web Crawler
1
作者 Yong-Bin Yu Shi-Lei Huang +3 位作者 Nyima Tashi Huan Zhang Fei Lei Lin-Yang Wu 《Journal of Electronic Science and Technology》 CAS CSCD 2018年第2期129-138,共10页
Abstract—Focused crawlers (also known as subjectoriented crawlers), as the core part of vertical search engine, collect topic-specific web pages as many as they can to form a subject-oriented corpus for the latter ... Abstract—Focused crawlers (also known as subjectoriented crawlers), as the core part of vertical search engine, collect topic-specific web pages as many as they can to form a subject-oriented corpus for the latter data analyzing or user querying. This paper demonstrates that the popular algorithms utilized at the process of focused web crawling, basically refer to webpage analyzing algorithms and crawling strategies (prioritize the uniform resource locator (URLs) in the queue). Advantages and disadvantages of three crawling strategies are shown in the first experiment, which indicates that the best-first search with an appropriate heuristics is a smart choice for topic-oriented crawlingwhile the depth-first search is helpless in focused crawling. Besides, another experiment on comparison of improved ones (with a webpage analyzing algorithm added) is carried out to verify that crawling strategies alone are not quite efficient for focused crawling and in most cases their mutual efforts are taken into consideration. In light of the experiment results and recent researches, some points on the research tendency of focused crawler algorithms are suggested. 展开更多
关键词 crawling strategies focused crawler harvest rate uniform resource locator(URL) prioritizing webpage analyzing
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部