摘要
网络资源信息的爆炸式增长、用户越来越个性化的需求,使得针对特定主题的搜索引擎越来越受到青睐。聚焦网络爬虫是主题搜索引擎的重要组成部分,它从Web上下载针对某一主题的文档。可定制的聚焦网络爬虫是具有主题的可选择性、可定制性的主题爬虫。文中介绍了一套更加有效的爬虫算法,它具有高效(优先下载主题相关度高的资源)、资源占用少(减少URL队列长度)、主题易移植(主题的可定制性)等特点。
As the intemet is developing explosively and the requirement of the users is becoming individualized, the search engine on a specialized topic is warmly received. A focusing crawler is an important part of the search engine which only fetches the web pages on a specific topic. The customized focusing web crawler is a topic crawler with optional and customizable topic. In this paper a crawler with which possesses the advantages of high downloaded), occupying few resources ( customizability of topics). productivity (resources with better quality new algorithms is designed, have higher priorities to be ( reducing the length of the URL queue), easy transplantation of topics
出处
《电子科技》
2009年第1期47-50,共4页
Electronic Science and Technology
关键词
信息收集
搜索引擎
网络爬虫
information collection
search engine
web crawler