期刊文献+

基于文档结构的信息抽取规则的描述语言比较研究

Describe Languages' Comparing for Web Information Extraction Rules Based on Page Structure
下载PDF
导出
摘要 基于文档结构的信息抽取工具很多 ,XWrap ,W 4F ,Lixto以及自主开发的PQagent是其中比较有代表性的几个 .这几个工具采用了不同的规则描述形式 ,XWrap ,W 4F ,Lixto使用的是自定义的规则描述形式 ,PQagent采用了通用的规范XQuery来描述规则 .现将XWrap ,W 4F ,Lixto使用的规则描述形式与PQagent采用的XQuery进行比较 ,说明了采用XQuery描述抽取规则的优越性 . There are many information extraction tools based on structure.Among them,XWrap,W4F,Lixto and PQagent which is developed by ourself are pretty representative.Each one uses a different way to describe their extraction rules.XWrap,W4F and Lixto predefine different ways to describe their rules,as while PQagent using the universal criterion--XQuery.Here,XQuery is compared with the ways respectively used by XWrap,W4F and Lixto,and conclusion is drawn that there is more advantage to use XQuery.
出处 《河北大学学报(自然科学版)》 CAS 2004年第2期212-218,共7页 Journal of Hebei University(Natural Science Edition)
关键词 信息抽取 抽取规则 描述语言 XQUERY information extraction extraction rule describe languages XQuery
  • 相关文献

参考文献12

  • 1BAUMGARTNER R,FLESCA S,GOTTLOB G. Visual Web information extraction with lixto [Z]. Proceedings of the intemation conference on Very Large Database,Roma, Italy,2001.
  • 2BAUMGARTNER R, FLESCA S, GOTTLOB G. Wrapper generation with lixto [Z]. Proceedings of the intemation conference on Very Large Database,Roma, Italy,2001.
  • 3L IU L, PU C,HAN W. XWRAP: An XML- enabled wrapper construction system for Web information sources [Z]. ICDE, SanDiego, 2000.
  • 4AZAVANT F,SAHUGUETV A. World Wide Web Wrapper Factory (W4F) user manual [Z] .August 15,2000.
  • 5LAENDER A, RIBEIRO- NETO B, SILVA A. A brief survey of web data extraction tools [J]. SI GMOD Record, 2002, 31 (2):84 - 93.
  • 6MENG XIAOFENG, LU HONGJUN, WANG HAIYAN, et al. Data extraction from the Web based on pre - defined schema [J]. J Computer Sci & Technol, 2002,17(4): 377 - 388.
  • 7LIU L, PU C, HAN W, BUTTLER, et al. XWRAP: An XML - based wrapper generation toolkit [ D]. Beaverton, Oregon: OGI/CSE, October 1998.
  • 8ARANAUD S,FABIEN A. Building light - weight wrappers for legacy Web data- sources using W4F [Z]. Proceedings of the international conference on Very Large Vax abase,Edinburgh,Scotland, 1999.
  • 9ARANAUD S,FABIEN A. Web Ecology: Recycling HTML pages as XML documents using W4F [Z]. In second international workshop on the Web and Database, Philadelphia, Pennsylvania, USA, 1999.
  • 10GEORG G, KOCH C. Monadic datalog and the expressive power of languages for Web information extraction [Z]. In proc,ACM PODS, Madison, Wisconsin, 2002.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部