期刊文献+

基于本体学习与动态内容识别的信息抽取系统自优化研究 被引量:1

Self-optimization of IE System Based on Ontology Learning and Dynamic Content Identification
下载PDF
导出
摘要 随着信息抽取技术的日益发展,信息抽取的准确性、效率、覆盖率以及维护成本等综合性能的提高成为有待突破的核心问题。提升信息抽取系统在运行过程中的自我优化能力是解决这个问题的关键。本文针对目前信息抽取系统优化中存在的人工参与过多、训练集要求过高等问题,提出一种基于本体学习与动态内容识别相结合的自优化方式,即通过动态内容识别结构化抽取结果,借助发掘的新概念促进本体学习,之后用新本体生成新抽取模式,循环迭代,最终实现信息抽取系统不断自优化。最后设计了系统实验方案并进行实验,实验结果证明在该自优化方案下抽取的准确性与覆盖率得到显著提升。 Pressure of massive network information promoted the naissance and development of information extraction (IE).To upgrade the accuracy,efficiency and coverage of IE and reduce the maintenance cost,researchers began to focus on the implementation of optimization capacity from running IE system.Aiming at the problems such us overmuch manual work and exigent training set in the optimization of IE system,this paper tries to propose a manner that is based on the combination of ontology learning and dynamic content identification(DCI)to realize self-optimization of the IE system. That means after structuring extraction result by DCI and advancing ontology learning by new-discovered conception,we create new extraction patterns with new ontology,carry through loop iteration and finally realize the incessant self-optimization mechanism of IE system.This paper designs integrated experimental program of concrete system with the result that the extraction coverage and accuracy based on this program has upgraded significantly.
出处 《情报学报》 CSSCI 北大核心 2011年第5期487-494,共8页 Journal of the China Society for Scientific and Technical Information
基金 国防技术基础项目的研究成果之一
关键词 信息抽取 本体学习 内容识别 抽取系统自优化 information extraction ontology learning content identification self-optimization of IE system
  • 相关文献

参考文献12

  • 1Zhang W,Yoshida T,Tang X. Using ontology to improve precision of terminology extraction from documents [ J ]. Expert Systems with Applications, 2009,7 : 9333-9339.
  • 2Honga C M,Chenb C M,Chiua C Y. Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems [ J ]. Expert Systems with Applications, 2009, 3: 3641-3651.
  • 3于江德,王立新,樊孝忠.基于自扩展的信息抽取模式自动获取[J].小型微型计算机系统,2009,30(5):891-894. 被引量:3
  • 4Brill E. Transform-based error-driven learning and natural language processing: A case study in part-of-speech tagging [ J ]. Computation Linguistics, 1995, 21 (4) : 543 -565.
  • 5马静,吴一占,刘思峰.基于领域本体的信息抽取模式生成与系统实现[J].情报学报,2008,27(2):193-198. 被引量:8
  • 6高俊杰,邓贵仕.一种OWL本体进化方法[J].计算机应用研究,2009,26(7):2564-2567. 被引量:1
  • 7Zhang Y M, Zhou J F. A Trainable Method for Extracting Chinese Entity Names and Their Relations [ C ]// Proceedings of the Sec2 and Chinese Language Processing Workshop, Hong Kong, 2000: 66-72.
  • 8Chieu H L, Ng H T. Named entity recognition: A maximum entropy approach using global information [ C ]//Proceedings of the Nineteenth International Conference on Computational Linguistics, 2002:190-196.
  • 9Zelenko D, Aone C, Richardella. A. Kernel methods form relations extraction [ J ]. Journal of Machine Learning Research, 2003,3 : 1080-1106.
  • 10McCallumn A, Li W. Early results for named entity recognition with conditional random fields [ C ]// Proceedings of the Conference on Computational Natural Language Learning,2003.

二级参考文献41

  • 1李向阳,张亚非.一种基于自举原理的语义模式自动获取方法[J].微电子学与计算机,2005,22(2):188-192. 被引量:3
  • 2许卓明,王琦.一种从关系数据库学习OWL本体的方法[J].河海大学学报(自然科学版),2006,34(2):208-211. 被引量:17
  • 3张慧颖,曲著伟.基于子树匹配的交互式Web数据抽取方法[J].计算机工程,2006,32(9):78-80. 被引量:8
  • 4何娟,高志强,陆青健,瞿裕忠.基于词汇相似度的元素级本体匹配[J].计算机工程,2006,32(16):185-187. 被引量:25
  • 5Kiyoshi Sudo.Unsupervised discovery of extraction patterns for information extraction[D].New York University,2004.
  • 6Yangarber R.Scenario customization for information extraction[D].New York University,2001.
  • 7Yangarber R,Grishman R.NYU:description of the proteus/PET system as used for MUC-7 ST[A].In:Marsh E,Perzanowski D,Proceedings of the 7th Message Understanding Conference(MUC-7)[C].Virginia,USA,1998,122-128.
  • 8Riloff E.Automatically constructing a dictionary for information extraction tasks[A].Proceedings of the Eleventh National Conference on Artificial Intelligence[C].In:AAAI Press,1993,811-816.
  • 9Kim J,Moldovan D.Acquisition of linguistic patterns for knowledge-based information extraction[J].IEEE Transactions on Knowledge and Data Engineering,1995,7(5):713-724.
  • 10Riloff E,Shoen J.Automatically acquiring conceptual answer patterns without an annotated corpus[C].In:Proceedings of the Third Workshop on Very Large Corpora,1995:148-161.

共引文献20

同被引文献58

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部