期刊文献+

基于RoBERTa与字词融合的电子病历命名实体识别方法研究 被引量:1

Research on named entity recognition method of electronic medical record based on RoBERTa and word fusion
下载PDF
导出
摘要 为了提高所抽取电子病历文本中语义信息的准确性,提出基于RoBERTa与字词融合的电子病历命名实体识别算法.采用预训练模型RoBERTa得到充分考虑上下文信息的字向量;然后对文本进行分词处理,再通过Word2Vec得到词向量;最后将两者进行融合传入双向长短记忆神经网络BiLSTM中进行训练,经过条件随机场CRF进行预测输出.在电子病历数据集上进行的对比实验表明,在采用3个评价指标的情况下,文中算法均明显优于经典的电子病历命名实体识别方法. EMR(electronic medical recode)named entity recognition is an important means of medical information extraction.In order to improve the accuracy of semantic information in the extracted electronic medical record text,a named entity recognition algorithm based on RoBERTa(robustly optimized BERT pretraining approach)and word fusion is proposed.The algorithm first uses the pre-training model ROBERTa to get the word vector which takes full account of the context information;then the text is segmented,and then Word2Vec is used to get the word vector;finally,the two are fused and transmitted to the BiLSTM(bidirectional long short memory neural network)for training,and then the CRF(conditional random fields)is used to predict the output.The experimental results on EMR datasets show that the proposed algorithm is superior to the classical EMR named entity recognition method in the case of three evaluation indexes.
作者 王卫东 张志峰 徐金慧 杨习贝 WANG Weidong;ZHANG Zhifeng;XU Jinhui;YANG Xibei(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212100,China)
出处 《江苏科技大学学报(自然科学版)》 CAS 北大核心 2023年第2期47-52,共6页 Journal of Jiangsu University of Science and Technology:Natural Science Edition
基金 国家自然科学基金资助项目(51609110,51779110) 江苏省自然科学基金资助项目(BK20191461) 江苏省六大人才高峰资助项目(KTHY-064)。
关键词 电子病历命名实体识别 预训练模型RoBERTa 双向长短记忆神经网络 条件随机场 字词融合 EMR named entity recognition pre-training model RoBERTa bidirectional long short memory neural network conditional randomfield word fusion
  • 相关文献

参考文献3

二级参考文献9

  • 1季姮,罗振声.基于统计和规则的中文姓名自动辨识[J].语言文字应用,2001(1):14-18. 被引量:13
  • 2[1]Kavallieratos E,Antoniades N,Fakotakis N,et al.Extraction and recognition of handwritten alphanumeric characters from application forms[C]∥DSP97.Greece Santorini,1997:695-698.
  • 3[2]Park,Lee.Off-line recognition of large-set handwritten characters with multiple hidden Markov models[J].Pattern Recognition,1996,29(2):231-244.
  • 4[3]Rose R C,Paul D B.A hidden Markov model based keyword recognition system[C]∥ICASSP.USA NM Albuquerque,1990:129-132.
  • 5[4]Burgess C J C,Ben J I,Denker J S,et al.Off line recognition of handwritten postal words using neural networks[J].International Journal of Pattern Recognition and Artificial Intelligence,1993,7(4):689-704.
  • 6[5]Duda R,Hart P,Stork D.Pattern Classification[M].2th ed.New York:John Wiley & Sons Inc,2001:3-15.
  • 7[6]Papageorigiou C,Oren M,Poggio T.A general framework for object detection[C]∥International Conference on Computer Vision,India Bombay,1998:555-562.
  • 8张仰森,徐波,曹元大.自然语言处理中的语言模型及其比较研究[J].广西师范大学学报(自然科学版),2003,21(A01):16-24. 被引量:11
  • 9庄明,老松杨,吴玲达.一种统计和词性相结合的命名实体发现方法[J].计算机应用,2004,24(1):22-24. 被引量:12

共引文献53

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部