期刊文献+

主题Web信息采集技术 被引量:1

Topic-Specific Web Information Collection Technology
下载PDF
导出
摘要 在互联网高速发展的今天,搜索引擎逐渐成为用户在Web上获取信息的主要工具。传统的通用搜索引擎利用一个Crawler程序面向整个Web进行信息采集,它的缺点是采集无针对性、页面失效率高、不能满足特定专业人群的需要。针对这种情况,需要一个分类细致精确、数据全面深入、更新及时的面向主题的搜索引擎。 Search engine has become people's main access to gather information on the web. Traditional generic search engine use a program named Crawler to collect information from the whole Web, it has some disadvantages such as non-specific information collection, high rates of pages missing, and can not meet the needs of specific professional groups. What we need is a focused search engine, well classified, containing profound and entire data, and updating in time.
作者 杜欢
出处 《四川理工学院学报(自然科学版)》 CAS 2007年第5期10-13,共4页 Journal of Sichuan University of Science & Engineering(Natural Science Edition)
关键词 搜索引擎 WEB CRAWLER 主题搜索引擎 search engine Web Crawler focused search engine
  • 相关文献

参考文献12

  • 1唐志,王成良.遗传算法在主题Web信息采集中的应用研究[J].计算机科学,2006,33(7):71-74. 被引量:5
  • 2Lawrence S,Giles C L.Searching the World Wide[J].Science,1998,280:98-100.
  • 3Lawrence S,Giles C L.Accessibility of information on the web[J].Nature,1999,400(6740):107-109.
  • 4邹海山,吴勇,吴月珠,陈阵.中文搜索引擎中的中文信息处理技术[J].计算机应用研究,2000,17(12):21-24. 被引量:35
  • 5Aggarwal C,Al-Garawi F,Yu P.Intelligent Crawling on the world wide web with Arbitrary Predicates[R].www10 May 1-5,2001,Hong Kong.
  • 6Brin S,Page L.The anatomy of a large-scale hyper-textual Web-search engine[A].Proc 7th International World Wide Web Conference[C].Brisbane:SIGIR,1998,146-164.
  • 7曹红兵.新一代搜索引擎UJIK0[J].图书馆建设,2007(2):48-49. 被引量:2
  • 8Cho J,Garcia-Molina H,Page L.Efficient crawling through URL ordering[J].Computer Networks,1998,30(1-7):161-172.
  • 9Yiming Yang.Noise reduction in a statistical approach to text categorization[A].18th ACM International Conference on Research and Development in Information Retrieval[C].Seattle,Washington,USA,1995,256-263.
  • 10Rennie J,McCallum A.Using reinforcement learning to spider the Web efficiently[A].Proceedings of the International Conference on Machine Learning (ICML 99)[C].1999,335-343.

二级参考文献26

  • 1Menezer F,Pant G, Ruiz M, et al. Evaluating Topic-Driven Web Crawlers [A]. In:Proceedings of 24th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval [C], 2001. 241-249
  • 2Ester M, Grob M, Kriegel H. Focused Web crawling: a generic framework for specifying the user interest and for adaptive crawling strategies[A]. In: Proceedings of 26th International Conference on Very Large Database(VLDB'01)[C], 2001. 527-534
  • 3Eichmann D. Ethical Web Agents. In.. Proceedings of the 2nd International World Wide Web Conference, Chicago, Illinois, USA,1994
  • 4Cho J. Crawling the Web.. Discovery and maintenance of largescale Web data [D]. Department of Computer Science, Stanford University, 2001
  • 5Hersoviei M, Heydon A, Mitzenmaeher M, et al. The sharksearch algorithm -An application: Tailored Web site mapping[A]. In:Proceedings of 7th International World Wide Web Conference [C], 1998. 317-326
  • 6Borodin A,Roberts G O,Rosenthal J S,et al. Finding Authorities and Hubs From Link Struetures on the World Wide Web [A]. In:Proceedings of 10th International world Wide Web Conference,ACM, 2001. 415-419
  • 7Cho J,Gareia-Molina H,Page L. Efficient crawling through URL ordering [J]. Computer Networks, 198,30(1-7) : 161-172
  • 8Rennie J, McCallum A. Using reinforcement learning to spiderthe Web efficiently [A]. In: Proceedings of the International Conference on Machine Learning(ICML 99)[C], 1999. 335-343
  • 9McCallum A, Nigam K, Rennie J, et al. Building Domain-Specific Search Engines with Machine Learning Techniques [A]. AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace [C],1999
  • 10Gibson D, Kleinberg J, Raghavan P. Inferring Web Communities from Link Topology. In: Proc. of the 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, Pennsylvania, USA, 1998

共引文献39

同被引文献38

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部