期刊文献+

基于实体的文本数据与XML文档的匹配技术研究

Research on Entity-based Matching Technology Between Text and XML
下载PDF
导出
摘要 目前飞机企业等单位的大量数据采用XML格式存储,且与其它业务文本数据之间缺乏联系.在异构数据集成领域,文本数据与XML文档的模式匹配还较少有人关注.提出文本数据与XML文档的匹配方法,该匹配方法采用两阶段的算法,首先使用基于条件随机场的实体抽取算法从文本文档中提取实体信息,然后通过基于实体的最近语义片段(ECSF)检索算法在XM L树中查询覆盖所有实体及实例的最近语义片段作为匹配对象.ECSF检索算法中基于实体的最近语义片段含义是XM L树上的覆盖所有实体及实例信息的最小子树,且实例所对应的实体必须是该实例的祖先节点.最后通过实验验证了本文提出方法的可行性和有效性,且具有较好的匹配效果,包括召回率和准确率. Currently,large amounts of data are stored in XMLwithin many enterprises,such as aircraft enterprise,and there is hardly any relationship between them and other business text data. In the field of heterogeneous data integration,there is hardly any research on matching technique between text and XML. This paper first proposes an approach to integrate plain text data and XML document.The approach is constructed with a two-step framework: first,extracting entities of the text by conditional-random-fields based entity extraction tool; then,locating the closest semantic fragment within the XML file that covers all of the extracted entities and instances by Entity-based Closest Semantic Fragment( ECSF) search algorithm. Furthermore,the entity node should be the ancestor of the corresponding instance node. Our evaluation shows that ECSF algorithm performs efficiently and achieves good result,including rate of recall and accuracy.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第11期2473-2478,共6页 Journal of Chinese Computer Systems
基金 上海市高新技术产业化重点项目(11-43)资助 国家行业专项(CHIN-ARE2015-04-07)资助
关键词 XML 匹配技术 实体抽取 基于实体的最近语义片段 ECSF XML matching technique entity extraction entity-based closest semantic fragment ECSF
  • 相关文献

参考文献18

  • 1Rahrn E, Bernstein P A. A survey of approaches to automatic schema matching[J]. The VLDBJournal ,2001, IO( 4) :334-350.
  • 2Do H H, Rahm E. COMA: A system for flexible combination of schema matching approaches[C] . Proceedings of the 28 th International Conference on Very Large Data Bases,2002:610-621.
  • 3Du Xiao-kun. Research on schema matching algorithm of database[D]. Wuhan: Huazhong University of Science & Technology, 2010.
  • 4曹兰英,严义,邬惠峰.基于模式匹配的XML自动转换技术[J].计算机工程与应用,2012,48(25):72-76. 被引量:6
  • 5Alsayed A,Eike S,Gunter S. A schema matching-based approach to XML schema clustering[C]. Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services,2008: 131-136.
  • 6Checiu L, Ionescu D. A new algorithm for mapping XML schema to XML schema[C]. Proceedings of IEEE ICCC-CONTI ,2010 :625-630.
  • 7Roy P , Mohania M, Bamba B, et al. Towards automatic associationof relevant unstructured content with structured query results[C]. Proceedings of the 14th ACM International Conference on Information and Knowledge Management,2005 :405-412.
  • 8Chakaravarthy V, Gupta H, et al. Efficiently linking text documents with relevant structured information[C]. Proceedings of the 32nd International Conference on Very Large Data Bases, 2006 : 667 -678.
  • 9Bhide M, Gupta A, et al. LIPTUS: Associating structured and unstructured information in a banking environment[C]. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data,2007 :915-923.
  • 10Hansu G, Mike G, Liang Z, et al. AnchorMF: towards effective event context identification[C]. Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management,2013 :629-638.

二级参考文献54

  • 1张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量:65
  • 2IEC 61499-2 Function blocks-Part 2: Software tool requirements[S].Geneva: International Electrotechnical Commission, 2005.
  • 3Rahm E,Bernstein P A.A survey of approaches to auto- matic schema matching[J].The VLDB Journal, 2001, 10 (4) :334-350.
  • 4Madhavan J, Bernstein P A, Rahm E.Generic schema matching with Cupid[C]//Proceedings of VLDB Confer- ence, 2001 : 49-58.
  • 5Melnik S,Molina-Garcia H,Rahm E.Similarity flooding: a versatile graph matching algorithm[C]//Proceedings of ICDE Conference, 2002:117-128.
  • 6Do H H,Rahm E.COMA-a system for flexible combina- tion of schema matching approaches[C]//Proceedings of the Very Large Data Bases Conference, 2002 : 610-621.
  • 7Aumilller D, Do H H, Rahm E, et al.Schema and ontology matching with COMA++ [C]//Proceedings of SIGMOD Conference, 2005 : 906-908.
  • 8Cruz I F,Antonelli F P, Stroe C.AgreementMaker: eft- cient matching for large real-world schemas and ontolo- gies[C]//Proceedings of VLDB Conference,2009:24-28.
  • 9Fellbaum C.WordNet: an electronic lexical database[M] Cambridge, MA: The MIT Press, 1998.
  • 10Wu Z, Palmer M.Verb semantics and lexical selection[C]// Proceedings of the 32nd Annual Meeting of the Associ- ation for Computational Linguistics, Las Cruces, New Mexico, 1994: 133-138.

共引文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部