摘要
为了解决中文医疗命名实体识别任务中语义缺失、命名实体嵌套等问题,提升名中医临床病历中的实体识别效果,提出基于自适应词嵌入RoBERTA-wwm的名中医临床病历命名实体识别模型。病历中原始文本经过RoBERTa-wwm预训练模型得到的初始向量采用Soft-lexicon方法动态融合词典信息,进行词汇增强,生成文本语义向量经过下游双向长短期记忆(BiLSTM)学习序列依赖关系,最终经过条件随机场(CRF)解码提取出实体。该模型在名中医李铁军治疗心血管疾病的临床病历数据集上取得86.88%的F1值,较RoBERTa-wwm-CRF、Bert-CRF模型分别提高5.93%、5.87%,在速度上也有所提升。在常规RoBERTA-wwm模型中引入自适应词嵌入进行词汇增强,使模型更好地学习文本语义信息,相较于其他基线模型,其在名中医临床病历命名实体识别任务方面具有显著优势。
In order to solve the problems of semantic missing and named entity nesting in Chinese medical named entity recognition task and improve the entity recognition effect in the clinical medical records of famous Chinese medicine, a named entity recognition model based on adaptive word embedding RoBERTA-wwm is proposed for the clinical medical records of famous Chinese medicine. The initial vector of the original text in the medical record obtained by the RoBERTa-wwm pre-training model adopts the Soft-lexicon method to dynamically fuse the dictionary information, perform vocabulary enhancement, and generate the text semantic vector. Random field(CRF) decoding extracts entities. The model proposed in this paper achieved an F1 value of 86.88% on the clinical medical record data set of the famous Chinese medicine Li Tiejun in the treatment of cardiovascular disease. Compared with the RoBERTa-wwm-CRF and Bert-CRF models, the F1 value was increased by 5.93% and 5.87%, respectively. There has also been an increase in speed. The adaptive word embedding is introduced into the conventional RoBERTA-wwm model for vocabulary enhancement, so that the model can better learn the textual semantic information. Compared with other baseline models, the named entity recognition task in the clinical medical records of famous Chinese medicine has significant advantages.
作者
万泽宇
龚庆悦
李铁军
王红云
鲍剑洋
WAN Ze-yu;GONG Qing-yue;LI Tie-jun;WANG Hong-yun;BAO Jian-yang(College of Artificial Intelligence and Information Technology,Nanjing University of Traditional Chinese Medicine;The Second Jiangsu Provincial Hospital of Chinese medicine,The Second Affiliated Hospital of Nanjing University of Chinese Medicine;Col-lege of Nursing,Nanjing University of Chinese Medicine,Nanjing 210046,China)
出处
《软件导刊》
2022年第12期58-62,共5页
Software Guide
关键词
信息抽取
命名实体识别
名中医临床病历
RoBERTa-wwm
词汇增强
information extraction
named entity recognition
famous traditional Chinese medicine electronic medical record
RoBERTawwm
vocabulary enhancement