摘要
【目的】解决中文电子病历实体识别中存在的一词多义、词识别不全等问题。【方法】采用深度学习模型RoBERTa-WWM-BiLSTM-CRF,改善中文电子病历的命名实体识别的效果并用4组实验进行对比,分析不同模型对中文电子病历实体识别的效果的影响。【结果】所提模型的实体识别效果F1值达到了0.8908。【局限】使用的数据集规模较小,部分科室实体识别效果较一般,如呼吸科F1值仅为0.8111。【结论】通过实验表明RoBERTa-WWM-BiLSTM-CRF模型更适用于中文电子病历命名实体识别任务,有效解决了中文电子病历命名实体识别中存在的一词多义和词识别不全的问题。
[Objective]This paper proposes an entity recognition model based on RoBERTa-wwm dynamic fusion,aiming to improve the entity identification of Chinese electronic medical records.[Methods]First,we merged the semantic representations generated by each Transformer layer of the pre-trained language model RoBERTa-wwm.Then,we input the bi-directional long short-term memory network and the conditional random field module to recognize the entities of the electronic medical records.[Results]We examined our new model with the dataset of“2017 National Knowledge Graph and Semantic Computing Conference(CCKS 2017)”and self-annotated electronic medical records.Their F1 values reached 94.08%and 90.08%,which were 0.23%and0.39%higher than the RoBERTa-wwm-BiLSTM-CRF model.[Limitations]The RoBERTa-wwm used in this paper completed the pre-training process with non-medical corpus.[Conclusions]The proposed method could improve the results of entity recognition tasks.
作者
张芳丛
秦秋莉
姜勇
庄润涛
Zhang Fangcong;Qin Qiuli;Jiang Yong;Zhuang Runtao(School of Economics and Management,Beijing Jiaotong University,Beijing 100044,China;National Clinical Medical Research Center for Nervous System Diseases,Beijing Tiantan Hospital Affiliated to Capital Medical University,Beijing 100050,China;Community Health Service Center,Beijing Jiaotong University,Beijing 100044,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第2期242-250,共9页
Data Analysis and Knowledge Discovery
基金
教育部人文社会科学规划项目(项目编号:18YJA870017)
吉林省社会科学基金项目(项目编号:2019B59)
吉林大学研究生创新基金项目(项目编号:101832020CX279)的研究成果之一。