摘要
知识库是进行各种自然语言处理任务不可或缺的一项基础性的资源。而目前知识库的构建还是一个难点问题,尤其是以自动方式构建复杂的领域性知识库系统的研究还处于探索阶段。本文提出一种基于实体-属性框架的领域知识库自动构建方法,致力于利用航空百科辞典的信息自动获取术语之间的上下位关系及部分实体属性关系,其中,基于多策略的上下位关系术语对提取融合了后缀子串匹配、模板自动构建、实质提取三种方法,分别考虑了辞典中反映上下位关系的不同信息。其中模板自动构建方法,在无需人工标注语料的情况下获得了比较好的效果。属性提取部分采用了以人工标注语料为前提的模板匹配方法。实验表明,本文系统对术语上下位关系抽取的F值达到76.01%,对各个属性的抽取也达到了75%以上。
Knowledge base is an essential basic resource for various natural language processing tasks.Currently,the construction of knowledge base is still a difficult problem,and the research on the automatic construction of a complex system of domain knowledge base is still in the exploratory stage.This paper proposes an automatic construction method of domain knowledge base under the entity-attribute frame,which aims to automatically extract the entity hyponymy and entity-attribute relationship by using the aerospace encyclopedia.In terms of entity hyponymy extraction,a multi-strategy method is adopted,in which suffix matching,automatic pattern construction and nature extraction are synchronized to reflect the different hyponymy in the encyclopedia.The automatic pattern construction method is proved to be effective without the manually labeled corpus.And the pattern matching method is applied to the attribute extraction based on the manually labeled corpus.Experimental result shows that the F-score of hyponymy extraction is 76.01%,and that of the attribute extraction is higher than 75%.
出处
《沈阳航空航天大学学报》
2011年第2期69-73,共5页
Journal of Shenyang Aerospace University
基金
教育部科学技术研究重点项目(项目编号:207148)
辽宁省高校创新团队支持计划项目(项目编号:2007T139)