摘要
医疗病历命名实体识别的主要任务是将临床电子病历中的非结构化文本转化为结构化数据,进而为面向医疗领域任务开展的数据挖掘提供基础支撑.提出一种基于ALBERT模型融合学习的中文医疗病历命名实体识别模型.首先,采用人工标注方式扩展样本数据集,结合ALBERT模型对数据集进行微调;其次,采用双向长短记忆网络(BiLSTM)提取文本的全局特征;最后,基于条件随机场模型(CRF)命名实体的序列标记.在标准数据集上的实验结果表明,该方法进一步提高了医疗文本命名识别精度,减少了时间开销.
The main task of named entity recognition on medical record is to convert unstructured text into structured data,and then provide an important fundamental support for data mining for medical field tasks.This paper proposes a named entity recognition method for Chinese medical records based on ALBERT and fusion model.Firstly,we use manual labeling to expand the sample dataset,and fine-tune the dataset in conjunction with the ALBERT.Secondly,the Bi-directional Long Short-Term Memory(BiLSTM)is used to extract the global features of the text.Finally,on the basis of the conditional random field model(CRF),sequence tags for named entities are made.The experimental results on the standard dataset show that the proposed method further improves the accuracy of name entity recognition on medical text and greatly reduces the time overhead.
作者
陈杰
奚雪峰
皮洲
盛胜利
崔志明
Chen Jie;Xi Xuefeng;Pi Zhou;Victor S Sheng;Cui Zhiming(School of Electronic and Computer Engineering,Suzhou University of Science and Technology,Suzhou 215009,China;Suzhou Smart City Research Institute,Suzhou 215009,China;Computer Science Department,Texas Tech University,Texas 79431,USA)
出处
《南京师范大学学报(工程技术版)》
CAS
2021年第1期36-43,共8页
Journal of Nanjing Normal University(Engineering and Technology Edition)
基金
国家自然科学基金项目(61673290、61876217)
江苏省“六大人才高峰”高层次人才项目(XYDXX-086)
苏州市科技发展计划产业前瞻性项目(SYG201817)、2020年江苏省研究生科研创新计划项目(KYCX20_2762).