期刊文献+

Web表格信息抽取研究综述 被引量:11

A Survey of the Research on Information Extraction over Web Tables
下载PDF
导出
摘要 介绍Web表格的特点与结构、Web表格信息抽取及其过程,分析Web表格信息抽取的4个关键技术:Web表格定位、Web表格结构识别、Web表格内容整合和抽取结果表示,以及Web表格信息抽取的应用。最后指出目前国内外该项研究的不足之处及未来发展方向。 This paper firstly introduces the characteristics and structure of Web tables and describes the process of information extraction over Web tables. Then four key technologies are analysed, including Web table detection, Web table structure recognition, Web table interpretation and presentation of table extraction. It also analyses the application of the research and points out the problems in current researches, and finally presents a prospect of its future.
出处 《现代图书情报技术》 CSSCI 北大核心 2008年第3期24-31,共8页 New Technology of Library and Information Service
关键词 WEB表格 信息抽取 表格定位 表格结构识别 表格内容整合 Web tables Information extraction Web table detection Web table structure recognition Web table interpretation
  • 相关文献

参考文献39

  • 1Gatterbauer W, Bohunsky P. Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model [ C ]. In :Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006) , Washington : AAAI Press,2006 : 1313 - 1318.
  • 2Douglas S, Hurst M. Layout and Language: List and Tables in Technical Documents [ C ]. In: Proceedings of ACL SIGPARSE Workshop on Punctuation in Computational Linguistics, New Jersey : Association for Computational Linguistics, 1996 : 19 - 24.
  • 3Hu J, Kashi R S, Lopresti D, et al. Evaluating the Performance of Table Processing Algorithms [ J ]. International Journal on Document Analysis and Recognition,2002,4 ( 3 ) : 140 - 153.
  • 4Ng H T, Kim C Y, Koo J L T. Learning to Recognize Tables in Free Texts [ C ]. In:Proceedings of the 37 th Annual Meeting of the Association for Computional Linguistics, New Jersey: Association for Computational Linguistics, 1999 :443 - 450.
  • 5Wang Y, Haralick R, Phillips I. Document Zone Content Classification and Its Performance Evaluation [ J ]. Pattern Recognition, 2006,39(1) :57 -73.
  • 6Wang Y, Phillips I T, Robert R M, et al. Table Structure Understanding and Its Performance Evaluation [ J ]. Pattern Recognition, 2004,37(7) :1479 - 1497.
  • 7McCallum A, Freitag D, Pereira F. Maximun Entropy Markov Modals for Information Extraction and Segmentation [ C ]. In : Proceeding of the 17th International Conference on Machine Learning, 2002:591 - 598.
  • 8Pinto D, McCallum A, Wei X, et al. Table Extraction Using Conditional Random Fields [ C ]. In : Proceedings of the ACM SIGIR, 2003:235 - 242.
  • 9Hammer J, Garcia M H, Cho J, et al. Extracting Semi - structured Information From the Web[ C]. In:Proceedings of the Workshop on Management of Semistructured Data, 1997 : 18 - 25.
  • 10Lim S, Ng Y. An Automated Approach for Retrieving Heirarchical Data from HTML Tables [ C ]. In: Proceedings of the 8th International Conference on Informaiton and Knowledge Management ( CIKM' 99), 1999:466 - 474.

二级参考文献28

  • 1[16]Hobbs J,Appelt D,Bear J et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Roche,Schabes eds. Finite State Devices for Natural Language Processing, MIT Press,Cambridge MA, 1996
  • 2[17]Appelt D E.Introduction to Information Extraction[J].AI COMMUNICATIONS, 1999; 12(3)
  • 3[18]Yangarber R.Scenario Customization for Information Extraction[D].Ph D Thesis.New York University,2001-01
  • 4[19]Cowie J, Lehnert W.Information Extraction[J].Communications of the ACM, 1996;39(1)
  • 5[20]Grishman R Adaptive information extraction and sublangu age analysis[C].In:Proceedings of IJCAI-2001 Workshop on Adaptive Text Extraction and Mining,2001
  • 6[1]Applet D E,Israel D J.Introduction to Information Extraction Technology. A Tutorial for IJCAI-99,1999
  • 7[2]Gaizauskas R,Wilks Y.Information Extraction:Beyond Document Retrieval[J].Journal of Documentation, 1997
  • 8[3]Sager N.Natural Language Information Processing. Reading,Massachusetts:Addison Wesley, 1981
  • 9[4]Dejong G.An Overview of the FRUMP System[C].In:LEHNERT W,RINGLE M h eds. Strategies for Natural Language Processing,Lawrence Erlbaum, 1982:149~176
  • 10[5]Grishman R,Sundheim B.Message Understanding Conference-6:A Brief History[C].In :Proceedings of the 16h International Conference on Computational Linguistics(COLING-96),1996-08

共引文献192

同被引文献104

引证文献11

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部