期刊文献+

基于DOM树的非规范化表格信息定位技术 被引量:2

Location Technology of Non-standardized Table Based on DOM Tree
下载PDF
导出
摘要 Web表格信息提取已经成为构建本体的重要内容之一,它能自动将本体所需的属性名和属性值提取出来,节省大量人工劳动。关于非规范化表格信息提取的研究比较少,对本体构建造成大量信息缺失。提供一种基于启发式规则的非规范化表格信息定位算法,其对定位非规范化表格准确率较高。 The information extraction of web table has become the important task of construct ontology. It extracts attrib- ute name and value for ontology automatically so that large volume human task can be saved. There are few studies for in- formation extraction of non-standardized table in the domestic and overseas. The above phenomenon causes information- missing in the process of building ontology. The present paper proposed a heuristic and inerratic location algorithm of non- standardized table which can provide a much higher accuracy rate for locating informal table.
作者 张兴兰 刘岩
出处 《软件导刊》 2016年第7期10-13,共4页 Software Guide
关键词 本体 非规范化表格 DOM树 Ontology Non-standardized Table DOM Tree
  • 相关文献

参考文献6

  • 1HURST M. Classifying table elements in HTML[-EB/OL~. ht- tp://www2002, org/CDROM/poster/115/index, html.
  • 2WANG Y, HU J. A machine learning--based approach for table detection on the Web ~ C~//Proceeding of the 1 lth International Conference on WWW, 2002 : 242-250.
  • 3CUI TAO. Schema matching and data extraction over HTML ta- blesFD]. USA~ Brigham Young University,2003.
  • 4CHEN H. Mining tables from large scale HTML texts[-C~. Pro- ceedings of the 18th International Conference on Computational Linguistics, 2000 : 166-172.
  • 5CHEN H H,TSAI S C, TSAI J H. Mining tables from large scale html texts [C]. In the 18th International Conference on Computa- tional Linguistics(COLING), 2000 : 166-172.
  • 6GAIZAUSKAS ROBERT, YORICK WILKS. Information ex- traction: beyond document retrieval[J~. Journal of Documenta tion,1998,54(1) :70-105.

同被引文献13

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部