期刊文献+

一种专题Web信息采集系统的设计方案 被引量:2

Fine design on focused Web crawler
下载PDF
导出
摘要 飞速发展的网络给综合性的采集系统带来了巨大的挑战 ,由此小型的专题信息采集已成为近年的研究热点。文章介绍了专题的 Web信息采集系统的基本原理 ,分析了专题页面在网络中的分布特性 ,提出了一种通过提供高质量种子集的方法来改善采集器性能的方法 ,节约了硬件和网络资源 ,使更新更加容易。 The rapid growth of the WorldWide Web poses unprecedented scaling challenges for generalpurpose crawlers. So the focused Web crawler becomes the focus research. We introduce the basic principles on focused Web crawler, the main function and technology. Based on analyzing distribution of the pages that are relevant to a topic in the Web, a new approach that provides the crawler with a good set of seeds is brought forward to improve the crawler's performance, leads to savings in hardware and network resources, and helps the crawler more easy to update.
作者 欧歌 赵恒永
出处 《电脑与信息技术》 2004年第6期52-55,共4页 Computer and Information Technology
  • 相关文献

参考文献4

二级参考文献12

  • 1毛国君.数据挖掘的概念、系统结构和方法[J].计算机工程与设计,2002,23(8):13-17. 被引量:28
  • 2Dell Zhang. A novel web usage mining approach for search engines[J]. Computer Networks, 2002,39:303-310.
  • 3Hart Jiawei. Data mining concepts and techniques[M]. Morgan Kaufinann Press,2001.435-449.
  • 4Beeferman D, Berger A. Agglomerative clustering of a search engine query log[C]. Proceedings of ACM KDD 2000,Boston, MA, USA.
  • 5Den R Greening. Data mining on the Web[J]. Web Techniques, 2000 (1):26-29.
  • 6Kleinberg J. Authoritative Sources in a Hyperlinked Environment,Proc. ACM-SLAM Symposium on Discrete Algorithms, 1998
  • 7Page L, Brin S. The Anatomy of a Large-scale Hypertextual Web Search Engine. http://www-diglib.stan ford.edu/cgi-bin/get/STOL-WP-1999-0123
  • 8Arocena G O, Mendelzon A O, Mihalla G A. Applications of a Web Query Language. Proc. 6th International World Wide Web Conference, 1997
  • 9Spertus E. ParaSite: Mining Structural Information on the Web. Proc 6th International World Wide Web Confereoce, 1997
  • 10Salton G, McGill M J. Introduction to Modem Information Retrieval [M]. New York: McGraw-Hill, 1993

共引文献13

同被引文献9

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部