期刊文献+

Web表格定位技术的研究与实现 被引量:9

Research and Implementation of Web Table Positioning Technology
下载PDF
导出
摘要 Web表格的定位作为Web表格抽取的一个重要研究内容,现在越来越得到更多人的重视。根据Web表格的结构标记和自定义的启发式规则,通过对〈TABLE〉嵌套问题的解决、数据表格完整性的判断、〈TABLE〉树的遍历来完成表格的定位。 Web table positioning technology is considered as essential components of Web table information extraction, and more and more people pay attention to them. This paper realized table positioning according to Web table structure label and heuristic method rules of user-definition, which includes the solution of (TABLE) nesting problem, the determination of table data's integrality, and traversal of (TABLE) tree.
出处 《计算机科学》 CSCD 北大核心 2009年第9期227-230,共4页 Computer Science
基金 国家自然科学基金(60575035) 上海市重点学科建设项目(J50103)资助
关键词 DOM树 表格定位 启发式规则 〈TABLE〉嵌套 遍历 DOM tree,Table positioning, Heuristic method rules, (TABLE) nesting, Traversal
  • 相关文献

参考文献9

  • 1Hammer J,Garcia-Molina H,Cho J,et al.Extracting semistructured information from the Web[J].SIGOD Record,1997,26(2):18-25.
  • 2Lim S,Ng Y.An automated approach for retrieving heirarchicsl data from HTML tables[A]//Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM'99)[C].1999:466-474.
  • 3Hurst M.Classifying Table Elements in HTML[A]//Proc.The 11th International World Wide Web Conference[C].WWW 2002,Sheraton Waikiki Honolulu,Hawaii,USA,May 2002.http://www2002,org/CDROM/poster/115/index,html.
  • 4Wang Y,Hu J.A Machine Learning-based Approach for Table Detection on the Web[A]//Proceedings of the 11th International Conference on WWW[C].2002:242-250.
  • 5Cui Tao.Schema Matching and Data Extraction over HTML Tables[D].USA:Brigham Young University,2003.
  • 6Chen H,et al.Mining Tables from Large Scale HTML Texts[A]//Proceedings of the 18th International Conference on Computational Linguistics[C].2000:166-172.
  • 7Chen Hsin-Hsi,Tsai Shih-Chung,Tsai Jin-He.Mining tables from large scale html texts[A]//The 18th International Conference on Computational Linguistics[C].July 2000:166-172.
  • 8Robert G,Wilks Y.Information extraction:Beyond document retrieval[J].Journal of Documentation,1998,54 (1):70-105.
  • 9Penn G,Hu J,Luo H,et al.Flexible Web document analysis for delivery to narrow-band width devices[A]//Proceeding of the 5th International Conference on Document Analysis and Recognition(ICDAR)[C].SCattle,USA,2001:1074-1078.

同被引文献63

引证文献9

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部