期刊文献+

一种基于人工策略的WEB信息精确提取系统 被引量:2

An Artifical Method-based System of Web Information Exact Extraction
下载PDF
导出
摘要 如何从海量的WEB信息中提取感兴趣的内容,传统的基于关键字检索的信息提取方法,适用于较为复杂的信息环境。针对特定信息的提取,提出了一种利用DOM树及HTML标签实现大量的特定格式化信息的精确提取方法。实验结果表明,在提取特定WEB信息的应用中达到100%的精确提取率。 With the development of Internet, it will be a new hotspot how to extract the information of our need from web. The traditional methods based on key words are applied to the fields on complex information. This paper puts forward an artifical method-based system by using DOM and HTML. The results show that the accuracy is 100 percent when extracting specifically information.
作者 刘玲
出处 《西南科技大学学报》 CAS 2009年第2期49-52,共4页 Journal of Southwest University of Science and Technology
基金 国家863计划项目(2003AA116060)
关键词 信息提取 人工策略 DOM Information extraction Artifical method DOM
  • 相关文献

参考文献7

  • 1Jeff Healon.网络机器人Java编程指南[M].北京:电子工业出版社,2002.
  • 2李学勇,欧阳柳波,李国徽,钟敏娟.网络蜘蛛搜索策略比较研究[J].计算机工程与应用,2004,40(4):128-131. 被引量:17
  • 3Champion, Mike, Vidur Apparao, Scott Isaaes, et al. Object Model (HTML) Level 1 [S]. W3C. 1998.
  • 4Web Browser Control Referenc. http ://msdn. microsoft. com/workshop/author/dhtml/reference/objects/obj document. asp. [ EB/OL]. Microsoft Corporation. 2002.
  • 5Friedl, Jeffrey E. F. Mastering Regular Expressions, 2nd Edition[ M]. Sebastopol, CA :O' Reilly and Associates, 2002.
  • 6Appelt, D. E. , D. J. Israel. Introduction to Information Extraction Technology[ C ]. A Tutorial Prepared for LICAI - 9, 1999. 4-5.
  • 7Crescenzi, V. , G. Mecca, P. Merialdo. Road Runner: Towards Automatic Data Extraction from Large Web Sites[ C]. Rome, Italy: In: Proceeding of the 26th International Conference on Very Large Database Systems, 2001. 109 - 118.

二级参考文献21

  • 1[20]Diligenti M,Coetzee F M,Lawrence S et al.Focused crawling using context graphs[C].In:Proc of the International Conference on Very Large Database ( VLDB ′00 ), 2000: 527~534
  • 2[21]Sutton R S,Barto A G.Reinforeement learning:an introduction[M].MA:MIT Press, 199822.Pant G,Srinivasan P,Menczer F.Exploration versus exploitation in topic driven crawler[C].In:Proc of The WWW-02 Workshop on Web Dynamics, 2002
  • 3[17]Bharat K Henznger.lmproved algorithms for topic distillation in a hyperlinked environment[C].In:Proc of SIGIR Conference on Research and Development in Information Retrieval,1998
  • 4[18]Dean J,Henzinger. Finding related pages in the World Wide Web [J].Computer Networks, 1999; 31 ( 11 ~ 16): 1467~1479
  • 5[19]Davison B.Topical locality in the web[C].In:Proc of the 23th Annual International Conference Information Retrieval,Athens,2000:272~279
  • 6[1]Murray B H,Moore A.Sizing the Intemet[M].A White Paper:Cyveillance, Inc, 2000
  • 7[2]Lawrence S ,Giles L.Accessibility and distribution of information on the Web[J].Nature, 1999 ;400(8): 107~109
  • 8[3]Cho J,Garcia-Molina H.The evolution of the Web and implication for an incremental crawler[C].In:Proc of the 26th International Conference on Very Large Databases(VLDB′00),2000
  • 9[4]Brewington B E,Cybenko G.How dynamic is the Web?[C].In:Proc of the 9th International World Wide Web Conference,2000
  • 10[5]Ester M ,Grob M ,Kriegel H.Focused Web crawling:a generic framework for specifying the user interest and for adaptive crawling stratrgies[C].In:Proc of the International Conference on Very Large Database(VLDB′01 ) ,2001

共引文献18

同被引文献15

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部