期刊文献+

基于本体的网络爬虫技术研究 被引量:7

A Study of Ontology-based Web Crawler
下载PDF
导出
摘要 互联网已经成为最大的非结构化数据库,极大方便了信息访问.然而,网络上的信息大多都是无组织的,由于网络的分布式特性,很难对它进行信息和知识管理.因此,如何建立一个智能的信息发现机制很有必要.本文在分析了爬虫工作原理和传统算法后,提出了一种基于本体的网络爬虫的信息发现框架.该框架包含了预处理模块和本体管理模块,定义了网页相关度计算策略,最后通过实验对该框架进行了评估. The Web, the largest unstructured database of the world, has greatly improved access to information. However, information on the Web is largely disorganized. Due to the distributed nature of the World Wide Web it is difficult to use it as a tool for information and knowledge management. Therefore, user doing the difficult task of exploring the Web has to be supported by intelligent means. This paper proposes an approach for information discovery building on a comprehensive framework for ontology-based web crawler. Our framework includes preproeessing module and ontology management module. It defines a relevance computation strategies of the web page and provides an empirical evaluation which has shown premising results.
出处 《情报学报》 CSSCI 北大核心 2007年第5期723-727,共5页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金资助项目(60573056),浙江省自然科学基金重点资助项目(Z106335),浙江省自然科学基金(Y105625).
关键词 本体 网络爬虫 语义网 信息检索 ontology, web crawler, semantic web, information retrieval
  • 相关文献

参考文献11

  • 1Davulcu H,Koduri S,Nagarajan S.Datarover:a taxonomy based crawler for automated data extraction from data-intensive websites.Proceedings of the 5th ACM international workshop on Web information and data management,November 2003.
  • 2Aggarwal C C.Collaborative Crawling:Aggarwal C.Collaborative crawling:mining user experiences for topical resource discovery.Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,July 2002.
  • 3Junghoo Cho,Hector Garcia-Molina,Lawrence Page.Efficient crawling through URL ordering.In Proceedings of the Seventh International World Wide Web Conference,pages 161-172,April 1998.
  • 4Page K,SBrin,R Motwani,Winograd T.The PageRank citation ranking:Bringing order to the web.USA:Stanford University,1998.
  • 5Jon Kleinberg.Authoritative Sources in A Hyperlinked Environment,Journal of the ACM,1999,46(5).
  • 6Gomez-Perez A,Manzano-Macho.A survey of ontology learning methods and techniques.OntoWeb Deliverable D1.5,2003.
  • 7Maedche A,Staab S.Ontology learning for the semantic web.IEEE Intelligent Systems,2001,16(2).
  • 8Aggarwal C C,Al-Garawi F,Yu P S.Intelligent crawling on the World Wide Web with arbitrary predicates.In Proc.10th Intl.World Wide Web conference,Hong Kong,May 2001.
  • 9Rennie J,McCallum A.Using Reinforcement Learning to Spider the Web Efficiently.In ICML-99,1999.
  • 10Ester M,Gross M.Ariadne:a focused crawler with adaptive classification of the hyperlinks.In Nat.Symp.On Machine Learning(FGML '2000),Birlinghoven,2000.

同被引文献80

引证文献7

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部