摘要
实体关系抽取作为信息抽取的主要任务之一,其目的在于确定无结构文本中两个实体的关系类别。目前准确率较高的有监督方法由于需要大量的人工标注语料而受到了限制,而远程监督方法则通过知识库与文本集进行启发式对齐来获取大量关系三元组,这是解决大规模关系抽取任务的主要途径。针对目前远程监督关系抽取的研究未能充分利用句子上下文词语的高层语义,以及未考虑关系之间的依赖包含关系的问题,文中提出了一种基于多层次注意力机制的远程监督关系抽取模型。该模型首先通过双向GRU(Gate Recurrent Unit)神经网络对句子词向量进行编码来获取句子高维语义;其次通过引入词语层注意力来计算两个实体与上下文词语的相关程度,从而充分捕捉句子中实体上下文的语义信息;然后在多个实例上构建句子层的注意力来减少标签错误标注的问题;最后通过关系层的注意力自动学习不同关系之间的依赖包含关系。在FreeBase+NYT公共数据集上的实验结果表明,在双向GRU模型的基础上引入词语层、句子层和关系层注意力机制对提高远程监督关系抽取的效果都起到了促进作用;将三层注意力机制进行融合得到的多层次注意力机制关系抽取模型的准确率和召回率相较于现有的主流方法提高了4%左右,更好地实现了关系抽取,从而为进一步构建知识图谱、智能问答等应用奠定了理论基础。
As one of the main tasks of information extraction,entity relation extraction aims at determining the relationship category of two entities in unstructured text.At present,the supervised method with high accuracy is limited by the need for a large number of manual tagging corpus.The distant supervision method obtains a large number of relational triples by heuristic alignment between knowledge base and text set,which is the main way to solve the large-scale relational extraction task.In order to solve the problems that the high-dimensional semantics of words in sentence context are not fully utilized and the dependency-inclusion relationship between relationships is not considered in the current research on distant supervision relation extraction,this paper proposed a multi-level attention mechanism model for distant supervision relation extraction.In this model,the high-level semantics of sentences are obtained by utilizing the bidirectional GRU(Gate Recurrent Unit) neural network to code the sentence word vectors.Then,the word-level attention is introduced to calculate the degree of correlation between two entities and the context words,thus capturing the semantic information of the entity context in sentences adequately.Next,the sentence-level attention is constructed on multiple instances to reduce the tag error annotation problem.Finally,the dependency-inclusion relationship between different relationships is automatically learned by the relation-level attention.The experimental results on FreeBase+NYT public dataset show that the introduction of word-level,sentence-level and relation-level attention mechanisms on the basis of bidirectional GRU model can improve the effect of distant supervision relation extraction.Compared with the existing mainstream methods,the multi-level attention mechanism relation extraction model obtained by integrating three levels attention mechanisms improves the accuracy and recall rate by about 4%,which achieves better relation extraction effect,thus providing a theoretical foundation for further constructing the knowledge graph and intelligent question answering applications.
作者
李浩
刘永坚
解庆
唐伶俐
LI Hao;LIU Yong-jian;XIE Qing;TANG Ling-li(School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China;State Press and Publication Administration Publishing Fusion Development Key Laboratory,Wuhan 430070,China)
出处
《计算机科学》
CSCD
北大核心
2019年第10期252-257,共6页
Computer Science
基金
国家自然科学基金(61602353)
湖北省自然科学基金(2017CFB505)资助
关键词
远程监督
关系抽取
双向GRU
词向量
注意力机制
Distant supervision
Relation extraction
Bidirectional GRU
Word embedding
Attention mechanism