摘要
目前在处理医学文本实体间关系提取任务中,使用传统的词向量表示方法无法解决医学文本中的词多义性问题,加上基于长短时记忆网络对文本语义局部特征抽取不够充分,不能充分捕捉医疗文本隐藏的内部关联信息。因此,提出一种基于XLNet-BiGRU-Attention-TextCNN的医疗文本实体关系抽取模型。利用XLNet模型将输入的医疗文本转化为向量形式,接着连接双向门控循环神经网络(BiGRU)提取文本语句的长距离依赖关系,然后使用注意力机制(Attention)为特征序列分配权重,降低噪声影响,最后利用文本卷积神经网络(TextCNN)对序列进行局部特征提取并通过softmax层输出关系抽取结果。实验结果表明,本文所提模型在精确率、召回率和F值上均优于基准模型。
At present,in the task of extracting the entity relationship of medical texts,the traditional word vector representation method cannot solve the problem of polysemous words in medical texts.In addition,the extraction of semantic local features of texts based on long and short-term memory networks is not sufficient,which could not capture the internal related information hidden in medical texts.To address the problem,a medical text entity relationship extraction model based on XLNet-BiGRU-Attention-TextCNN is proposed.Use the pre-trained language model XLNet to convert the input medical text into vectors,and connect the bidirectional gated recurrent neural network(BiGRU)to extract the long-distance dependence of the text sentence,then use the attention mechanism(Attention)to assign weights to the feature sequence,thereafter reduce the impact of noise,finally use the Text Convolutional Neural Network(TextCNN)to extract local features of the sequence and output the relationship extraction results through the softmax layer.Experimental results show that the model proposed in this paper is better than the benchmark model in terms of accuracy,recall and F value.
作者
郑增亮
沈宙锋
苏前敏
ZHENG Zengliang;SHEN Zhoufeng;SU Qianmin(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
出处
《智能计算机与应用》
2023年第4期8-13,共6页
Intelligent Computer and Applications
基金
“十三五”国家科技重大专项(2018ZX09711001-009-011)
科技创新2030-“新一代人工智能”重大项目(2020AAA0109300)。