摘要
与中英文相比,藏文实体关系训练语料规模较小,传统有监督的学习方法难以获得较高的准确率。针对基于远程监督的实体关系抽取存在错误标记的问题,利用远程监督方法将知识库与文本对齐,构建藏文实体关系抽取的数据集,提出一个基于多级注意力融合机制的藏文实体关系抽取模型。在词级别引入自注意力机制来提取单词的内部特征,在句子级别引入注意力机制为每个实例分配权重,从而充分利用包含信息的句子,减少噪声实例的权重。同时引入联合评分函数,修正远程监督的错误标签,并将神经网络与支持向量机结合,实现藏文实体关系分类。实验结果表明,提出的模型有效提高了藏文实体关系抽取的准确率,且优于基线模型效果。
Compared with Chinese and English,the training corpus of Tibetan entity relation is smaller,so it is difficult to obtain higher accuracy based on traditional supervised learning methods.And there exists the problem of wrong labels in distant supervision for relation extraction.To solve these problems,the distant supervision method was used to construct the data set of Tibetan entity relation extraction through aligning the knowledge base with texts,which could alleviate the problem of lacking of large-scale corpus in Tibetan.And a Tibetan entity relation extraction model based on multi-level attention fusion mechanism was proposed.The self-attention was added to extract internal features of words in word level.The selective attention mechanism could assign weights of each instance,so as to make full use of informative sentences and reduce weights of noisy instances.Meanwhile,a joint score function was introduced to correct wrong labels,and neural network was combined with support vector machine to extract relations.Experimental results show that the proposed model can effectively improve the accuracy of Tibetan entity relation extraction,and is better than the baseline.
作者
王丽客
孙媛
刘思思
WANG Like;SUN Yuan;LIU Sisi(School of Information Engineering,Minzu University of China,Beijing 100081,China;National Language Resource and Monitoring and Research Center of Minority Languages,Minzu University of China,Beijing 100081,China)
出处
《智能科学与技术学报》
2021年第4期466-473,共8页
Chinese Journal of Intelligent Science and Technology
基金
国家自然科学基金资助项目(No.61972436)。
关键词
藏文
实体关系抽取
多级注意力融合机制
支持向量机
Tibetan
entity relation extraction
multi-level attention fusion mechanism
support vector machine