摘要
细分领域实体关系的获取是知识工程深化与泛化应用的关键问题,当前面临对人工标注语料严重依赖这一核心难题,一种自然的解决思路是利用细分领域已有的(或可低成本获取的)知识库作为指导。与通用型知识库不同,细分领域知识库往往规模较小,因此不仅要利用其中的现成知识内容,还有必要充分发掘蕴含于领域知识库中规律性的"领域元知识"。本文提出一种融合领域元知识和词嵌入向量类比的细分领域实体关系发现方案:首先,根据已有知识库抽象出特定细分领域的实体关系约束条件,如症状表征关系由<疾病,症状>实体对构成;其次,依据相应领域语料计算领域实体的词嵌入向量;随后,针对知识库中少量高质实体关系学习各类关系词嵌入类比的正负例向量基准,以此为基础训练实体关系分类器;最后,针对给定的领域实体,综合关系约束、词嵌入相似度、词嵌入类比结果分类,得到与其构成不同类型关系的实体。以心血管领域数据为例,仅用少量从百科抽取的领域知识,即可取得较好的实体关系识别效果。
The acquisition of entity relationships in subdivided domains is a key issue for deepening and generalizing applications of knowledge engineering. In order to tackle the core problem of heavy reliance on manually annotated corpus at present, a natural solution is to use the existing(or low-cost) knowledge base in the subdivided domains as a guide. In contrast to the general knowledge base, the domain knowledge base is often small. This means it is necessary to not only use the ready-made knowledge content, but also to fully explore the"domain meta-knowledge"contained in the domain knowledge base. This paper proposes a subdivided domain entity relationship discovery scheme that combines domain meta-knowledge and a word embedding vector analogy. First, this paper describes the entity relationship constraints of a specific subdivided domain based on the existing knowledge base, such as the symptom representation relationship, which consists of entity pairs. Secondly, the word embedding vector of the domain entity is calculated according to the corresponding domain corpus. Following this, the positive and negative case vector benchmarks of various relational word embedded analogies are learned to provide a small number of high-quality entity relationships in the knowledge base, with the entity relationship classifier then trained based on this. Finally, for a given domain entity, by combining relational constraints, word embedding similarity, and word embedding analogy results, the entities that form different types of relationships are obtained. Taking the cardiovascular data as an example, a small amount of domain knowledge extracted from the encyclopedia can be used to obtain a better entity relationship recognition effect.
作者
陈果
许天祥
Chen Guo;Xu Tianxiang(Department of Information Management,Nanjing University of Science and Technology,Nanjing 210094;Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety,Nanjing 210094)
出处
《情报学报》
CSSCI
CSCD
北大核心
2019年第11期1200-1211,共12页
Journal of the China Society for Scientific and Technical Information
基金
国家社会科学基金青年项目“领域分析视角下的科技词汇语义挖掘与知识演化研究”(16CTQ024)
关键词
领域实体关系
词嵌入类比
术语分析
领域知识分析
domain entity relationship
word embedding analogy
term analysis
domain knowledge analysis