摘要
随着信息抽取技术的日益发展,信息抽取的准确性、效率、覆盖率以及维护成本等综合性能的提高成为有待突破的核心问题。提升信息抽取系统在运行过程中的自我优化能力是解决这个问题的关键。本文针对目前信息抽取系统优化中存在的人工参与过多、训练集要求过高等问题,提出一种基于本体学习与动态内容识别相结合的自优化方式,即通过动态内容识别结构化抽取结果,借助发掘的新概念促进本体学习,之后用新本体生成新抽取模式,循环迭代,最终实现信息抽取系统不断自优化。最后设计了系统实验方案并进行实验,实验结果证明在该自优化方案下抽取的准确性与覆盖率得到显著提升。
Pressure of massive network information promoted the naissance and development of information extraction (IE).To upgrade the accuracy,efficiency and coverage of IE and reduce the maintenance cost,researchers began to focus on the implementation of optimization capacity from running IE system.Aiming at the problems such us overmuch manual work and exigent training set in the optimization of IE system,this paper tries to propose a manner that is based on the combination of ontology learning and dynamic content identification(DCI)to realize self-optimization of the IE system. That means after structuring extraction result by DCI and advancing ontology learning by new-discovered conception,we create new extraction patterns with new ontology,carry through loop iteration and finally realize the incessant self-optimization mechanism of IE system.This paper designs integrated experimental program of concrete system with the result that the extraction coverage and accuracy based on this program has upgraded significantly.
出处
《情报学报》
CSSCI
北大核心
2011年第5期487-494,共8页
Journal of the China Society for Scientific and Technical Information
基金
国防技术基础项目的研究成果之一
关键词
信息抽取
本体学习
内容识别
抽取系统自优化
information extraction
ontology learning
content identification
self-optimization of IE system