期刊文献+

基于自适应词嵌入RoBERTa-wwm的名中医临床病历命名实体识别研究 被引量:1

Research on Named Entity Recognition of Named TCM Clinical Medical Records Based on RoBERTa-wwm Adaptive Word Embedding
下载PDF
导出
摘要 为了解决中文医疗命名实体识别任务中语义缺失、命名实体嵌套等问题,提升名中医临床病历中的实体识别效果,提出基于自适应词嵌入RoBERTA-wwm的名中医临床病历命名实体识别模型。病历中原始文本经过RoBERTa-wwm预训练模型得到的初始向量采用Soft-lexicon方法动态融合词典信息,进行词汇增强,生成文本语义向量经过下游双向长短期记忆(BiLSTM)学习序列依赖关系,最终经过条件随机场(CRF)解码提取出实体。该模型在名中医李铁军治疗心血管疾病的临床病历数据集上取得86.88%的F1值,较RoBERTa-wwm-CRF、Bert-CRF模型分别提高5.93%、5.87%,在速度上也有所提升。在常规RoBERTA-wwm模型中引入自适应词嵌入进行词汇增强,使模型更好地学习文本语义信息,相较于其他基线模型,其在名中医临床病历命名实体识别任务方面具有显著优势。 In order to solve the problems of semantic missing and named entity nesting in Chinese medical named entity recognition task and improve the entity recognition effect in the clinical medical records of famous Chinese medicine, a named entity recognition model based on adaptive word embedding RoBERTA-wwm is proposed for the clinical medical records of famous Chinese medicine. The initial vector of the original text in the medical record obtained by the RoBERTa-wwm pre-training model adopts the Soft-lexicon method to dynamically fuse the dictionary information, perform vocabulary enhancement, and generate the text semantic vector. Random field(CRF) decoding extracts entities. The model proposed in this paper achieved an F1 value of 86.88% on the clinical medical record data set of the famous Chinese medicine Li Tiejun in the treatment of cardiovascular disease. Compared with the RoBERTa-wwm-CRF and Bert-CRF models, the F1 value was increased by 5.93% and 5.87%, respectively. There has also been an increase in speed. The adaptive word embedding is introduced into the conventional RoBERTA-wwm model for vocabulary enhancement, so that the model can better learn the textual semantic information. Compared with other baseline models, the named entity recognition task in the clinical medical records of famous Chinese medicine has significant advantages.
作者 万泽宇 龚庆悦 李铁军 王红云 鲍剑洋 WAN Ze-yu;GONG Qing-yue;LI Tie-jun;WANG Hong-yun;BAO Jian-yang(College of Artificial Intelligence and Information Technology,Nanjing University of Traditional Chinese Medicine;The Second Jiangsu Provincial Hospital of Chinese medicine,The Second Affiliated Hospital of Nanjing University of Chinese Medicine;Col-lege of Nursing,Nanjing University of Chinese Medicine,Nanjing 210046,China)
出处 《软件导刊》 2022年第12期58-62,共5页 Software Guide
关键词 信息抽取 命名实体识别 名中医临床病历 RoBERTa-wwm 词汇增强 information extraction named entity recognition famous traditional Chinese medicine electronic medical record RoBERTawwm vocabulary enhancement
  • 相关文献

参考文献9

二级参考文献49

共引文献101

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部