期刊文献+

基于预训练BERT字嵌入模型的领域实体识别 被引量:9

Domain Entity Recognition Based on Pre-trained BERT Character Embedding
下载PDF
导出
摘要 随着医疗信息化的发展,越来越多的医疗信息被数字化的记录下来,这些医疗信息蕴含着丰富的医学知识。如何有效地提高提取和利用海量医疗文本信息成为当下医疗信息化发展的巨大挑战,针对目前医疗文本标注数据的不足以及医疗实体边界模糊的问题,本文提出一种基于大量医疗文献预训练的字嵌入语言表示模型。该模型利用大量的医疗文献对BERT模型进行预训练,从而得到EMRBERT模型,再通过EMR-BERT对训练文本进行字嵌入向量表示,将结果输到Bi-LSTM模型,最后利用CRF模型进行输出得到最终的结果。通过多组对比实验证明,EMR-BERT+Bi LSTM+CRF模型最终结果优于目前主流模型。因此,该模型能够有效解决医疗电子病历领域命名实体识别任务下,标注数据不足以及实体边界模糊的问题。 With the development of medical informationization, more and more medical information is digitally recorded. These medical information contains a wealth of medical knowledge. How to effectively improve the effective extraction and utilization of massive medical text information has become a huge challenge for the development of medical informationization. In order to solve the problem of insufficient data labeling and blurring of medical entity boundaries, this paper proposes a word embedding language representation model based on a large number of medical literature pre-training, which uses a large number of medical literature to pre-train the BERT model to obtain the EMR-BERT model, and then embeds the text into the training text through EMR-BERT. It means that the result is input to the Bi-LSTM model, and finally the output is obtained by using the CRF model. Through multiple sets of comparison experiments, the results of EMR-BERT+BiLSTM+CRF model is better than the current mainstream model. Therefore, the model can effectively solve the problem of insufficient annotation data and fuzzy boundary of the entity in the medical electronic medical record field.
作者 丁龙 文雯 林强 DING Long;WEN Wen;LIN Qiang(School of Computer,University Of South China,Hengyang 421001,China)
出处 《情报工程》 2019年第6期65-74,共10页 Technology Intelligence Engineering
基金 湖南省教育厅优秀青年项目(18B279) 湖南省哲学社会科学课题(16YBA323) 湖南省“研究生科研创新”项目(CX20190737) 南华大学研究生教改项目(2016JG029)。
关键词 医疗电子病历 命名实体识别 EMR-BERT 字嵌入 Bi-LSTM CRF Medical electronic record named entity recognition EMR-BERT character embedding Bi-LSTM CRF
  • 相关文献

参考文献3

二级参考文献168

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:115
  • 2林东,邵军力.医学诊疗领域通用专家系统设计与实现[J].自动化学报,1995,21(3):380-382. 被引量:6
  • 3赵健,王晓龙,关毅.中文名实体识别中的特征组合与特征融合的比较[J].计算机应用,2005,25(11):2647-2649. 被引量:7
  • 4俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:153
  • 5姜维,王晓龙,关毅,徐志明.应用粗糙集理论提取特征的词性标注模型[J].高技术通讯,2006,16(10):996-1000. 被引量:3
  • 6Doan A,Naughton JF,Ramakrishnan R,et al.Information extraction challenges in managing unstructured data[J].ACM SIGMOD Record,2008,37(4):14-20.
  • 7Vlachos A,Gasperin C.Bootstrapping and evaluating named entity recognition in the biomedical domain[C]//Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology.New York:Association for Computational Linguistics Morristown,2006:138-145.
  • 8Bundschus M,Dejori M,Stetter M,et al.Extraction of semantic biomedical relations from text using conditional random fields[J].BMC Bioinformatics,2008,9:207.
  • 9Leaman R,Gonzalez GR.BANNER:An executable survey of advances in biomedical named entity recognition[C]//Proceedings of Pacific Symposium on Biocomputing.Hawaii:World Scientific Publishing Co.Pte.Ltd,2008:652-663.
  • 10Leaman R,Miller C,Gonzalez G.Enabling recognition of diseases in biomedical text with machine learning:Corpus and benchmark[C]//Proceedingsof the 3rdInternational Symposium on Lagauges in Biology and Medicine.Seogwipo-si.LBM,2009:82-89.

共引文献187

同被引文献95

引证文献9

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部