期刊文献+

基于信息量的Web表格信息抽取方法 被引量:2

Information Extraction on Web Tables Based on Information Ratio
下载PDF
导出
摘要 提出一种基于有效信息量的Web表格信息抽取模型,该模型主要由表格定位和表格信息抽取二个模块组成,根据Web表格的内容特征来识别主题表格,通过检查格式、语法的特征将表格分割成值域与属性域.实验结果表明该模型能够很好地应用于Web表格信息的抽取. It is proposed that a new model based on table structure that extracts information from tables of Web documents.It is composed of table positioning module and table information extraction module.The theme table by the contents characteristics of the Web tables is identified.The area segmentation cleans up tables and segments them into attribute and value areas by checking visual and semantic coherency.The experimental results show that this model is well performed in information extraction from tables of Web documents.
出处 《西南师范大学学报(自然科学版)》 CAS CSCD 北大核心 2010年第4期159-163,共5页 Journal of Southwest China Normal University(Natural Science Edition)
基金 重庆市教委科学技术研究项目(KJ091309)
关键词 WEB表格 有效信息率 文档对象模型 信息抽取 Web table effective information ratio DOM information extraction
  • 相关文献

参考文献7

二级参考文献26

  • 1胡东东,孟小峰.一种基于树结构的Web数据自动抽取方法[J].计算机研究与发展,2004,41(10):1607-1613. 被引量:21
  • 2张瑞,李石君.网上表格数据到XML的自动转换[J].计算机工程与应用,2007,43(2):190-192. 被引量:5
  • 3Lim S J,Ng Y K,Yang X.Integrating HTML Tables Using Semantic Hierarchies and Meta-data Sets[C]//Proc.of International Symposium on Database Engineering and Applications.[S.1.]:IEEE Press,2002:160-169.
  • 4Jung S W,Kwon H C.A Scalable Hybrid Approach for Extracting Head Components from Web Tables[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(2):174-187.
  • 5Li Shijun,Liu Mengchi,Peng Zhiyong.Wrapping HTML Tables into XML[C]//Proc.of the 5th International Conference on Web Information Systems Engineering.Brisbane,Australia:Springer,2004:147-152.
  • 6DOM Interest Group. Document Object Model (DOM) [EB/OL]. http: //www. w3. org/DOM/, 2006-06-12.
  • 7Valter Crescenzi, Giansalvatore Mecca. Automatic Information Extraction From Large Websites [J]. Journal of the ACM, 2004, 51(5): 731-779.
  • 8Valter Crescenzi, Paolo Merialdo, Paolo Missier. Fine-Grain Web Site Structure Discovery [A]. Proceedings of the Fifth ACM International Workshop on Web Information and Data Management [C]. New Orleans: ACM Press, 2003. 382-397.
  • 9Chen J L, Zhou B Y, Shi J, et al. Function Based Object Model Towards Website [A]. Hong Kong: 10 th International World Wide Web Conference [C]. 2001. 587-596.
  • 10Kushmerick N,Weld D,Doorenbos R.Wrapper induction for information extraction[C]//Proc IJCAI,1997.

共引文献26

同被引文献13

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部