摘要
关系抽取任务是对句子中的实体对进行关系分类。基于远程监督的关系抽取是用预先构建的知识库来对齐朴素文本,自动标注数据,在一定程度上减少了人工标注的成本,缓解了藏文材料语料不足的问题。但是基于远程监督的实体关系抽取还存在错误标记、提取特征时出现噪声等问题。该文用远程监督方法进行藏文实体关系抽取,基于已经构建的藏文知识库,利用分段卷积神经网络结构,加入语言模型和注意力机制来改善语义歧义问题以及学习句子的信息;在训练过程中加入联合得分函数来动态修正错误标签问题。实验结果表明改进的模型有效提高了藏文实体关系抽取的准确率,且优于基线模型效果。
Distant supervision for relation extraction is an efficient method to automatically align entities in texts to a given knowledge base(KB), which alleviated the problem of manual labelling. In this paper, we propose an improved distant supervised relation extraction model in Tibetan based on Piecewise Convolutional Neural Network(PCNN). The language model and the selective-attention mechanism are combined to alleviate wrong labelling problems and to extract effective features. Soft-label method is also introduced to dynamically correct the relation label. The experimental results show that our method is effective and outperforms several competitive baseline methods.
作者
王丽客
孙媛
夏天赐
WANG Like;SUN Yuan;XIA Tianci(School of Information Engineering,Minzu University of China,Beijing 100081,China;Minority Languages Branch,National Language Resource and Monitoring Research Center,Minzu University of China,Beijing 100081,China)
出处
《中文信息学报》
CSCD
北大核心
2020年第3期72-79,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(61972436)。
关键词
藏文实体关系抽取
语言模型
注意力机制
Tibetan entity relation extraction
language model
attention mechanism