期刊文献+

基于移动爬虫的专用Web信息收集系统的设计 被引量:3

Design of a Specific Web Information-Collecting System Based on Mo bile Crawler
下载PDF
导出
摘要 搜索引擎已经成为网上导航的重要工具。为了能够提供强大的搜索能力,搜索引擎对网上可访问文档维持着详尽的索引。创建和维护索引的任务由网络爬虫完成,网络爬虫代表搜索引擎递归地遍历和下载Web页面。Web页面在下载之后,被搜索引擎分析、建索引,然后提供检索服务。文章介绍了一种更加有效的建立Web索引的方法,该方法是基于移动爬虫(MobileCrawler)的。在此提出的爬虫首先被传送到数据所在的站点,在那里任何不需要的数据在传回搜索引擎之前在当地被过滤。这个方法尤其适用于实施所谓的“智能”爬行算法,这些算法根据已访问过的Web页面的内容来决定一条有效的爬行路径。移动爬虫是移动计算和专业搜索引擎两大技术趋势的结合,能够从技术上很好地解决现在通用搜索引擎所面临的问题。 Search engines have become important tools for Web navigation.In order to provide powerful search facili-ties,search engines maintain comprehensive indices of documents available on the Web.The creation and maintenance of Web indices is done by Web crawlers,which recursively traverse and download Web pages on behalf of search engines.Analysis of the collected information is performed after the data has been downloaded.This paper presents an alterna-tive,more efficient approach to building Web indices based on mobile crawlers.The proposed crawlers are transferred to the source(s)where the data resides in order to filter out any unwanted data locally before transferring it back to the search engine.Our approach to Web crawling is particularly well suited for implementing so-called″smart″crawling al-gorithms which determine an efficient crawling path based on the contents of Web pages that have been visited so far.Mobile crawler is the result of the two technology tendencies,specific search engine and mobile computing,it promises to solve the difficult issues faced by current general search engines.
出处 《计算机工程与应用》 CSCD 北大核心 2003年第36期153-156,共4页 Computer Engineering and Applications
基金 国家自然科学基金资助(编号:60073030) 国家教育部"现代远程教育关键技术研究重点项目"资助 富士通研究项目资助
关键词 互联网 搜索引擎 WEB 信息收集系统 设计 移动爬虫 Information-gathering,Search engine,Mobile crawler,World Wide Web
  • 相关文献

参考文献1

二级参考文献15

  • 1[1]Martijn Koster. Guidelines for Robot Writers [EB/OL]. http ://info. webcrawler. com/mak/projects/robots/guidelines. html.
  • 2[2]Oskari Heinonen, et al. WWW Robots and Search Engines[Z].(1996).
  • 3[3]David Pallmann. Progrmming Bots, Spiders , and Intelligent Agent in Microsoft Visual C++[M].北京:北京希望电子出版社,1999.41-59.
  • 4[4]M Koster. A Standard for Robot Exclusion[EB/OL] .http://info. webcrawler. com/mak/ projects/ robots/norobots. html.
  • 5[5]HTML4.01规范[EB/OL].http://www.3c.org/TR/html4.
  • 6[6]http://www.w3.org/TR/html4/references.html # ref-RFC2616[EB/OL].
  • 7[7]David Eichmann. The RBSE Spider - Balancing Effective Search Aginst Web Load [ C ]. In Proceedings of the First International World Wide Web Conference 1994. 113-120.
  • 8[8]Oliver A McBryan. GENV and WWW: Tools for Taming the Web[C]. In Proceedings of the First International World Wide Web Conference 1994.79-90.
  • 9[9]Brian Pinkerton. Finding What Peolple Want: Experiences with the WebCrawler[C]. In Proceedings of the Second International World Wide Web Conference, 1994.
  • 10[10]Sergey Brin,Lawrence Page. The Anatomy of a Large-scale Hypertextual Web Search Engine [C]. In Proceedings of the Seventh International World Wide Web Conference, April 1998,107-117.

共引文献13

同被引文献18

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部