摘要
中文预训练语言模型能够表达句子丰富的特征信息,并且可解决针对中文出现的"一词多义"问题,是当前自然语言处理任务中普遍使用的模型。研究预训练模型在中文电子病历命名实体识别任务上的应用,为基于深度学习的中文电子病历信息抽取探索一种信的信息优化方法。该文首先介绍了四种语言预训练模型BERT,ERNIE,ALBERT,NEZHA,并搭建预训练模型、BiLSTM、CRF的融合结构,在CCKS2018中文电子病历数据集上进行医学命名实体识别任务。实验结果表明NEZHA取得了当前预训练模型最优的识别结果。
Chinese pre-training language model can express rich feature information of sentences and solve the problem of"polysemy"in Chinese.It is widely used in natural language processing tasks.This paper studies the application of pre-training model in Chinese EMR named entity recognition task,and explores an information optimization method for Chinese EMR information extraction based on deep learning.Firstly,this paper introduces four language pre-training models:BERT,ERNIE,AlBERT,NEZHA,and builds the fusion structure of pre-training model,BILSTM and CRF,and carries out medical named entity recognition task on ccks2018 Chinese electronic medical record data set.The experimental results show that NEZHA achieves the best recognition result of the current pre-training model.
作者
吴小雪
张庆辉
Wu Xiao-xue;Zhang Qin-hui(Henan University of Technology,Henan Zhengzhou 450001)
出处
《电子质量》
2020年第9期61-65,共5页
Electronics Quality
关键词
预训练模型
命名实体识别
电子病历
pre-training model
named entity recognition
electronic medical record