期刊文献+

基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别 被引量:13

Chinese electronic medical record named entity recognition based on sentence-level Lattice-long short-term memory neural network
下载PDF
导出
摘要 目的提出一种基于Re-entity新分词方法的条件随机场(CRF)模型,并与双向长短记忆神经网络(BiLSTM)-CRF和Lattice-长短记忆神经网络(LSTM)进行比较。方法比较了现有实体识别方法和模型后,针对2018年全国知识图谱与语义计算大会(CCKS2018)任务一“电子病历命名实体识别”,提出基于Re-entity的CRF、BiLSTM-CRF、Lattice-LSTM方法,并在不同语料库训练不同参数级别的字符向量集。分别将各方法引入神经网络模型中进行模型性能对比实验,最后分别基于句子级和篇级输入句长进行对比研究。结果CRF模型在最优特征工程的结果下引入Re-entity方法后性能得到提高,句子级的Lattice-LSTM模型在该任务上取得了89.75%的严格F1-measure,优于CCKS2018任务一的最高结果(89.25%)。结论基于Re-entity新分词方法的CRF模型可利用中文临床药物知识库有效提高电子病历中药物的识别率,Re-entity方法可改善数据预处理阶段分词导致的错误累加,Lattice结构可以更好地结合字符和词序列的潜在语义信息,同时句子级输入能有效提高神经网络模型的识别准确率。 Objective To propose a conditional random field(CRF)model based on the new word segmentation method Re-entity,and to compare with bi-directional long short-term memory neural network(BiLSTM)-CRF and Lattice-long short-term memory neural network(LSTM).Methods After analyzing the existing entity recognition methods,we proposed CRF method based on Re-entity,BiLSTM-CRF and Lattice-LSTM for the China Conference on Knowledge Graph and Semantic Computing in 2018(CCKS2018)task one:Chinese clinical named entity recognition,and trained character vector sets at different parameter levels based on different corpora.The comparative experiments on model performance were carried out in the different neural network models for each methods.Finally,the comparative study was carried out based on different input lengths such as the sentence level and the text level.Results Re-entity method can improve the performance of CRF model.Lattice-LSTM model based on sentence level achieved a strict F1-measure of 89.75%on this task,which was higher than the highest F1-measure(89.25%)on the task one of CCKS2018.Conclusion The CRF model based on Re-entity can effectively improve the recognition rate of traditional Chinese medicines in electronic medical records by using normalized Chinese clinical drug.Re-entity method can improve the error accumulation caused by word segmentation in data preprocessing.Lattice structure can better combine the latent semantic information of characters and word sequences.At the same time,sentence-level input can effectively improve the recognition accuracy of neural network models.
作者 潘璀然 王青华 汤步洲 姜磊 黄勋 王理 PAN Cui-ran;WANG Qing-hua;TANG Bu-zhou;JIANG Lei;HUANG Xun;WANG Li(Department of Medical Informatics,School of Medicine,Nantong University,Nantong 226001,Jiangsu,China;College of Computer Science and Technology,Harbin Institute of Technology,Shenzhen,Shenzhen 518055,Guangdong,China;Department of Rheumatology and Immunology,Changzheng Hospital,Naval Medical University (Second Military Medical University),Shanghai 200433,China;Department of Communication Engineering,School of Information Science and Technology,Nantong University,Nantong 226001,Jiangsu,China)
出处 《第二军医大学学报》 CAS CSCD 北大核心 2019年第5期497-506,共10页 Academic Journal of Second Military Medical University
基金 国家重点研发计划(2018YFC0116902) 国家自然科学基金(81873915) 江苏省研究生科研与实践创新计划项目(KYCX17-1932)~~
关键词 计算机化病案系统 中文电子病历 实体识别 条件随机场 双向长短记忆神经网络 点阵长短记忆神经网络 computed medical records systems electronic medical record entity identification conditional random field bi-directional long short-term memory neural network lattice-long short-term memory neural network
  • 相关文献

参考文献8

二级参考文献168

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:116
  • 2林东,邵军力.医学诊疗领域通用专家系统设计与实现[J].自动化学报,1995,21(3):380-382. 被引量:6
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:157
  • 4Doan A,Naughton JF,Ramakrishnan R,et al.Information extraction challenges in managing unstructured data[J].ACM SIGMOD Record,2008,37(4):14-20.
  • 5Vlachos A,Gasperin C.Bootstrapping and evaluating named entity recognition in the biomedical domain[C]//Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology.New York:Association for Computational Linguistics Morristown,2006:138-145.
  • 6Bundschus M,Dejori M,Stetter M,et al.Extraction of semantic biomedical relations from text using conditional random fields[J].BMC Bioinformatics,2008,9:207.
  • 7Leaman R,Gonzalez GR.BANNER:An executable survey of advances in biomedical named entity recognition[C]//Proceedings of Pacific Symposium on Biocomputing.Hawaii:World Scientific Publishing Co.Pte.Ltd,2008:652-663.
  • 8Leaman R,Miller C,Gonzalez G.Enabling recognition of diseases in biomedical text with machine learning:Corpus and benchmark[C]//Proceedingsof the 3rdInternational Symposium on Lagauges in Biology and Medicine.Seogwipo-si.LBM,2009:82-89.
  • 9Tsai Tzong-ham,Chou Wen-Chi,Wu Shih-Hung,et al.Integrating Linguistic Knowledge into a Conditional Random Field Framework to Identify Biomedical Named Entities[J].Expert Systems with Applications,2006,30(1):117-128.
  • 10Sun ChengJie,Guan Yi,Wang XiaoLong,et al.Biomedical named entities recognition using conditional random fields model[J].Lecture notes in computer science,2006,4223:1279-1288.

共引文献387

同被引文献96

引证文献13

二级引证文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部