期刊文献+

基于约束的半结构化信息的抽取方法 被引量:2

EXTRACTING SEMISTRUCTURED INFORMATION FROM WEB
下载PDF
导出
摘要 为了对WEB上不规则的动态信息按照数据库的方式集成和查询,本文采用对象交换模型(OEM)建立WEB上信息模型。为了将页面中各个部分表示为对应的OEM对象,本文(1)设计了半结构化信息的抽取算法;(2)定义了满足约束条件的数据抽取格式,并且设计了输出正确抽取格式的候选者算法;(3)给出测试结果。该方法可以抽取结构化和半结构化的信息,比现有的抽取方法通用性更强。 In order to integrate and query irregular and dynamic information on WEB in a database fashion,Object Exchange Model(OEM)is used to construct the information model of WEB. In order to express each component of the pages as an OEM object in this paper we have the following: (1) an algorithm which extracts semistructured data from HTML pages is designed; (2)a data extracting format which satisfies the constraints is defined and a candidate algorithm which outputs correct extracting format is designed; (3)the testing results have been given out.The structured and semi-structured data can thus be extracted by our method.It has more applicability than other current methods.
作者 黄豫清 邹涛
出处 《计算机应用与软件》 CSCD 北大核心 2002年第1期53-59,共7页 Computer Applications and Software
关键词 数据抽取格式 OEM模型 数据抽取格式约束 半结构化信息 数据库 Data extracting format OEM model Data extracting format constraint
  • 相关文献

参考文献5

  • 1[1]N.Ashish and C.Knoblock."Wrapper Generation for Semi - structured Intemet Sources."WorkShop on Management of Semistructured Data,Ventana Canyon Resort,Tucson,Arizona,1997.
  • 2[2]N.Hammar and H.Garcia - Molina,J.Cho,R Aranha,A.Crespo."Extracting Semistructured Information from the Web."Work Shop on Management of Semistructured Data,Ventana Canyon Resort,Tucson,Arizona,1997.
  • 3[3]N.Kushmerick,D.S.Weld,and R.Doorenbos.Wrapper induction for information extraction.In International Joint Conference on Artificial Intelligence (LJCA1),Nagoya,Japan,1997.
  • 4[4]S.Chawathe,H.Garcia - Molina,J.Hammer,K.Ireland,Y.Papakonstantinou.J.Ulman,and J.Widom,"The TSIMMIS Project: Intergration of Heterogeneous Information Sources",In Proceedings of Tenth Anniversary Meeting of the Information Processing Society of Japan,Tokyo,Japan,7~18,1994.
  • 5[5]Kushmerick,N.,Wrapper Induction for Information Extraction.Ph.D.Dissertation,Univ.of Washington,1997.

同被引文献16

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部