摘要
目的探索中医领域利用少量标注语料进行电子病历中医学实体信息的命名实体识别(NER)研究工作,为更复杂的中医电子病历信息处理及深度学习方法在中医领域内的运用提供参考。方法分析中医电子病历词汇术语与一般的NER任务相比较的特殊性,对比了目前3种NER技术的优缺点,找寻适合中医电子病历医学术语的NER技术。结果长短时记忆神经网络(LSTM)是一种无监督学习模型,能有效利用序列数据中长距离依赖信息,特别适合处理文本序列数据;还可以和条件随机场(CRF)模型相结合,解决中医NER的难点。长短时记忆神经网络联合条件随机场模型(LSTM-CRF)可以在未标记的病历文本语料上无监督学习词语特征,不依赖于人工设计特征模板而达到自动提取患者症状、疾病、诱因等命名实体的目的。结论中医电子病历术语识别应利用多种命名实体识别技术,充分发挥这些技术的优势,提高模型识别准确性。
Objective To explore how to use the small amount of labeled corpora in the field of TCM to conduct research on named entity recognition(NER)of medical entity information in electronic medical records(EMR);To provide references for the application of more complex information processing of TCM EMR and indepth learning methods in the field of TCM.Methods Specificity of vocabulary and terminology of TCM EMR compared to general NER tasks was analyzed,and the advantages and disadvantages of the current three NER technologies were compared,so as to find the named entity recognition technologies suitable for medical terminology of TCM EMR.Results As an unsupervised learning model,long and short-term memory(LSTM)neural network could effectively utilize long-distance dependent information in sequential data,especially suitable for processing text sequence data.It could also be combined with conditional random field model(CRF)to solve the difficulty of NER in TCM.LSTM-CRF model could learn word features in unsupervised condition in unmarked medical record text corpus,and could automatically extract named entities such as symptoms,diseases and causes of patients without relying on the artificial design of feature templates.Conclusion TCM EMR should be applied to multiple NER technologies,making full use of the advantages of these technologies and improving the accuracy of model recognition.
作者
孙超
谢晴宇
SUN Chao;XIE Qing-yu(School of Traditional Chinese Medicine,Capital Medical University,Beijing 100069,China;Institute of Basic Research in Clinical Medicine,China Academy of Chinese Medical Sciences,Beijing 100700,China)
出处
《中国中医药图书情报杂志》
2020年第2期1-5,共5页
Chinese Journal of Library and Information Science for Traditional Chinese Medicine
基金
北京中医药“薪火传承3+3工程”崔锡章中医文化传承工作室。
关键词
命名实体识别
长短时记忆神经网络
条件随机场
中医电子病历
named entity recognition(NER)
long and short-term memory(LSTM)
conditional random fields
TCM electronic medical records(EMR)