期刊文献+

基于DOM树与领域本体的Web抽取方法 被引量:5

Web Extraction Method Based on DOM Tree and Domain Ontology
下载PDF
导出
摘要 为解决异构DeepWeb结果页面中数据区域及数据记录的自动抽取问题,提出一种基于DOM树与领域本体的Web抽取方法。利用数据内容特征以及领域本体库标记DOM树的节点,按照结果页面展示规律定位数据区域,根据改进的简单树匹配算法,定位数据区域及数据记录。实验结果表明,该方法定位数据区域及数据记录的F-measure值比传统的抽取方法高2.93%~6.67%。 To solve the problem of automatic extraction from different DeepWeb result page structures,this paper proposes a method which combines the Web structure and the content of Web pages.This method uses the characteristics of data content and the DOM tree nodes which are marked by the domain ontology library positioning data area.An improved simple tree matching algorithm is used to identify data records.Experimental results show that the F-measure value of this method is 2.93%~6.67% higher than that of traditional methods.
出处 《计算机工程》 CAS CSCD 2012年第5期56-58,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60970015 61003054) 江苏省企业博士创新基金资助项目(BK2009563) 江苏省高校自然科学研究基金资助项目(10KJB520018) 苏州市科技型企业技术创新专项基金资助项目(SG201043)
关键词 自动抽取 DOM树 领域本体 数据区域定位 简单树匹配 automatic extraction DOM tree domain ontology data area positioning simple tree matching
  • 相关文献

参考文献6

  • 1Bergman M K.The Deep Web:Surfacing Hidden Value[J].The Journal of Electronic Publishing,2001,7(1):8912-8914.
  • 2杨舟,卓林,赵朋朋,崔志明.一种针对商品数据记录的自动抽取方法[J].计算机工程,2010,36(23):262-265. 被引量:8
  • 3Bille P.A Survey on Tree Edit Distance and Related Problems[J].Theoretical Computer Science,2005,337(1-3):217-239.
  • 4Zhai Yanhong,Liu Bing.Web Data Extraction Based on Partial Tree Alignment[C] //Proc.of the 14th International Conference on World Wide Web.New York,USA:ACM Press,2005:76-85.
  • 5Liu Bing.Web Data Mining[M].Berlin,Germany:Springer,2009.
  • 6刘丹,谢庆生,顾新建.电子商务环境下产品本体构建技术研究[J].计算机应用,2007,27(3):752-755. 被引量:11

二级参考文献18

  • 1宋强,徐鹏,李涓子.半结构化文档中非标记化表格的抽取[J].计算机工程,2005,31(18):81-83. 被引量:3
  • 2凌玲,胡于进,王学林,李成刚.协同设计环境下基于语义的本体建立方法[J].中国机械工程,2005,16(19):1757-1761. 被引量:4
  • 3Liu Bing. Mining Data Records in Web Pages[C]//Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. Washington D. C. , USA: [s. n. ], 2003:601-606.
  • 4Miao Gengxin, Tatemura J, Hsiung Wang+Pin, et al. Extracting Data Records from the Web Using Tag Path Clustering[C] //Proceedings of the 18th International Conference on the World Wide Web. Madrid: Spain, [s. n. ], 2009: 981-990.
  • 5Zhai Yanhong, Liu Bing. Web Data Extraction Based on Partial Tree Alignment [C]//Proceedings of the 14th International Conference on the World Wide Web. Chiba, Japan.. [s. n. ], 2005 : 76-85.
  • 6Wang Jingyi, Lochovsk F H. Data Extraction and Label Assignment for Web Databases[C]//Proceedings of the 12th International Conference on the World Wide Web. Budapest, Hungary: [s. n. ],2003.. 187-196.
  • 7Liu Bing, Zhai Yanhong. NET: System for Extracting Web Data from Flat and Nested Data Records[C]//Proceedings of the Conference on Web Information Systems Engineering: New York, USA: [s. n.], 2005: 487-495.
  • 8Liu Wei, Meng Xiaofeng, Meng Weiyi. Vision-based Web Data Records Extractign[C]//Proceedings of the 9th Int'l Workshop on Web and Databases. New York, USA: ACM Press, 2006: 20 -25.
  • 9LEE JG,KANG JY,LEE ES.ICOMA:An Open Infrastructure for Agent-based Intelligent Electronic Commerce on the Internet[A].International Conference on Parallel and Distributed Systems (ICPADS'97)[C].Seoul,Korea,1997.
  • 10CORCHO O,GOMEZ-PEREZ A.Solving Integration Problems of Ecommerce Standards and Initiatives through Ontological Mappings[A].Proceedings of the Workshop on E-Business and Intelligent Web at the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001)[C].Seattle,USA,2001.

共引文献17

同被引文献50

  • 1池亚平,方勇.Servlet技术与应用方法[J].北京邮电大学学报,2003,26(z1):137-139. 被引量:11
  • 2李献礼,范会联.基于JSP/Servlet技术的网上选课系统的设计及实现[J].涪陵师范学院学报,2005,21(5):107-110. 被引量:9
  • 3Badica A, Badica C, Popescu E. Application of log- ic wrappers to hierarchical data extraction from HTML[M]. Heidelberg : Springer Berlin, 2007.
  • 4Yang S, Wang G, Han Y. Grubber: allowing end users to develop XML-Based wrappers for Web data sources [M]. Heidelberg: Springer Berlin, 2009.
  • 5Carey M J ,Onose N,Petropoulos M. Data services[J]. Communications of the ACM, 2012,55 (6) : 86-97.
  • 6Palekar V R. A Visual Based Page Segmentation for Deep Web Data Extraction[C] // Proceedings of the International Conference on Soft Computing for Problem Solving, Springer India : 2012 :791-804.
  • 7Liu W, Meng X, Meng W. Vide: A vision-based approach for deep web data extraction[J]. Knowl- edge and Data Engineering, IEEE Transactions on,2010,22(3) :447-460.
  • 8Li Baoan. Research on SOA and Compnent Orien- ted Technology in Development of Large System [C]// Computational Intelligence and Design (ISCID). USA : IEEE, 2010 : 29-31.
  • 9Liu L, Pu C, Han W. XWRAP: an XML-enabled wrapper construction system for Web information sources[C]. Proceedings of the 16th IEEE Inter- national conference on Data Engineering, 2000:611-621.
  • 10Rpbert B, Gartmer, Visual Web information ex- traction with lixto[C]// VLDB. 2001 : 119 128.

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部