摘要
在传统中文电子病历的命名实体识别任务中,针对医疗实体边界不清、实体嵌套、语句成分缺失、高度依赖人工提取特征等问题,提出基于词嵌入结合BiLSTM-CRF模型的中文电子病历命名实体识别模型。将电子病历文本数据集进行脱敏处理及序列标注等数据预处理,结合词嵌入匹配病历文本序列进行词向量化表示,利用BiLSTM神经网络对前后向病历文本进行空间语义建模,获取文本序列的语义特征,然后利用CRF预测实体标签输出。实验结果表明,改进后的BiLSTM-CRF模型显著提高了病历实体识别的准确率和召回率。
In the task of recognition of the named entity of traditional Chinese electronic medical records,to solve problems such as the medical entity boundary is unclear,entity nesting,sentence components missing,and heavy reliance on manual extraction features,the named entity recognition model of Chinese electronic medical records based on word embedding combined with BILSTM-CRF model was proposed.The text data set of electronic medical records was desensitized and pre-processed with sequence labeling,and the vectorized representation of words was completed with word embedding by matching text sequence of medical records.BiLSTM neural network was used to model the spatial semantics of the backward and forward medical record text to obtain the semantic features of the text sequence.And then,CRF was used to predict the output of entity label.The experimental results show that the improved BiLSTMCRF model can increase the accuracy and recall rate of medical record entities recognition significantly.
作者
李超凡
马凯
Li Chaofan;Ma Kai(School of Medical Information and Engineering,Xuzhou Medical University,Xuzhou 221004,Jiangsu Province,China)
出处
《中国数字医学》
2022年第4期32-37,共6页
China Digital Medicine
基金
徐州市科技计划项目重点研发计划(KC21308)
江苏省研究生教育教学改革研究与实践课题(JGZZ19_065)
江苏省大学生创新创业项目(201810313047Y,201910313004Z)。
关键词
电子病历
命名实体识别
双向长短期记忆神经网络
条件随机场
Electronic medical record
Named entity recognition
Bidirectional long-short-term memory neural network
Conditional random field