期刊文献+

基于Ontology的面向主题的网络信息采集算法 被引量:6

An Ontology-based Approach to Topic-specific Web Resource Discovery
原文传递
导出
摘要 介绍基于内容评价的、基于链接结构评价的和基于巩固学习的三种采集算法的优缺点;介绍一种依据词典构建主题Ontology的方法,该方法有助于提高Ontology的构建速度;最后,在分析传统采集算法的基础上,提出一种新的基于Ontology的面向主题的网页采集算法,并通过试验证明其优越性。 This paper summarizes the merits and flaws of the traditional approaches to topic specific web resource discovery, which include page content-based approach, page link-based approach and the approach of using reinforcement learning. In addition, the paper introduces a method of using a dictionary to build Ontology. which can reduce much time of users, On this basis, an Ontology-based approach to topic-specific web resource discovery is put forward, which shows great advantages through experiments.
出处 《图书情报工作》 CSSCI 北大核心 2006年第5期78-82,共5页 Library and Information Service
基金 浙江省2004年自然基金项目"面向电子商务的语义信息搜索与挖掘研究"(项目编号:M063149)的研究成果之一。
关键词 网页采集 ONTOLOGY RDFS page crawling Ontology rdfs
  • 相关文献

参考文献10

  • 1李学勇,欧阳柳波,李国徽,钟敏娟.搜索引擎中网络蜘蛛搜索策略比较研究[J].计算技术与自动化,2003,22(4):63-67. 被引量:12
  • 2Cho .I, Garcia-Molina H, Page L. Efficient crawling through URLordering. Computer Networks, 1998,30(1-7):161-172
  • 3Padmini Srinivasan, Gautam Pant, Filippo Menczer. Target seeking crawlers and their Topical Performance.[2005-08-03].http://citeseer.ist,psu.edu/srinivasan02target.html
  • 4Rennie J, McCallum A. Using reinforcement learning to spiderthe Web efficiently. In: Bratko I, Dzeroski S, eds. Proc. of the International Conference on Machine Learning.San Francisco:Morgan Kaufmann Publishers Inc. 1999.335-343
  • 5Lewis D D et al. Training algorithms for linear text classifiers.In: Frei H, Harman D, Schauble Pet al. eds Proceedings of the Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval, New York: ACM Press, 1996.298-306
  • 6Chakrabarti S, Berg M, van den Dom B. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 1999,31 ( 11-16): 1623-1640
  • 7W3C. Resource description framework.[2005-08-03].http://www.w3.org/RDF/
  • 8Zhao Y. Ontology resource page.[2005-08-03].http://people.cs. uchicago.edu/-yongzh/Ontology.html
  • 9Dong Z D. How Net Knowledge Database.[2005-08-03].http://www.Keenage.com/
  • 10W3C. RDF vocabulary description language 1.0: RDF schema.[2005-08-03].http://www.w3 .org/TR/2004/REC-rdf- schema-20040210/

二级参考文献22

  • 1[1]Murray B H, Moore A. Sizing the Internet[ M]. A White Paper:Cyveillance, Inc. 2000.
  • 2[2]Lawrence S, Giles L. Acxessibility and distribution of information on the Web[J]. Nature. 1999, 400:107~ 109.
  • 3[3]Cho J, Garcia - Molina H. The evolution of the Web and implication for an incremental crawler[J]. In: Proc of the 26th International Conference on Very Large Databases ( VLDB' 00), 2000.
  • 4[4]Brewington B E, Cybenko G. How dynamic is the Web[J ] . In: Procof the 9th Intemational World Wide Web Conference. 2000.
  • 5[5]Ester M, Grob M, Kriegel H. Focused Web crawling: a generic framework for specifying the user interest and for adaptive crawling stratrgies[J]. In: Proc of the International Conference on Very Large Database(VLDB'0i ),2001.
  • 6[6]Cho J, Garcia-Molina H, Page L. Efficient crawling through URL ordering[J]. Computer Networks. 1998 30( 1 ~7): 161 ~ 172.
  • 7[7]Chakrabarti S, van den Berg M, Dom B. Focused crawling: a new approach to topic - specific Web resource discovery [J]. Computer Networks. 1999, 31 (11~ 16):1623~1640.
  • 8[8]Rennie J, McCallum A. Using reinforcement leaming to spider the Web efficiently[J]. In: Proc of the Intemational Conference on Machine Learning( ICML 99), 1999.
  • 9[9]Aggarwal C, AI - Garawi F, Yu S P. Intelligent crawling on the World Wide Web with arbitrary Predicates[J]. In: Proc of the 10th International World Wide Web Conference,2001.
  • 10[10]Mentzer F. Complementing search engines with online Web mining agents. Decision Support Systcrs[J]. 2003, 35(2): 195~212.

共引文献11

同被引文献33

  • 1凌云,刘军,王勋.多层次web文本分类[J].情报学报,2005,24(6):684-689. 被引量:12
  • 2王立希,王建东,汪静.基于数据挖掘的新词发现[J].计算机应用研究,2006,23(12):195-197. 被引量:8
  • 3Hansen M T. The scarch-transfer problem:The role of weak ties in sharing knowledge scross organization subunits. Administrative Science Quarterly, 1999 (44) :78 - 89.
  • 4[1]Ling Liu,Calton Pu,Wei Han.XWRAP:An XML-enabledWrapper Construction System for Weblnformation Sources[A].Proceedings of the 16th International Conference on Data Engineering (ICDE' 2000)[C],San Diego,CA,2000:611-621.
  • 5[2]S.Soderland.Leaming information extraction rules for semistructured and free text[J].Machine Learning,1999,34 (1-3):233-272.
  • 6[3]Bootstrapping an Ontology-Based Information Extraction System[EB/OL],http://www.fzi.de/ipe/publikationen.php?id=827,2006-6-29.
  • 7[4]C.Cortes and V.Vapnik.Support vector networks[J].Machine learning.1995,20:273-297.
  • 8[5]Document Object Model (DOM)[EB/OL],http://www.w3.org/DOM/,2006-6-29.
  • 9[6]VIPS:a Vision-based Page Segmentation Algorithm[EB/OL],http://research.microsoft.com/research/pubs/view.aspx?tr_id=690,2006-6-29.
  • 10邓胜利,胡昌平,张玉峰.企业竞争情报智能采集的策略研究[J].情报学报,2007,26(4):620-626. 被引量:4

引证文献6

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部