融合知识的中文医疗实体识别模型

Chinese medical entity recognition model with knowledge fusion

下载PDF

导出

摘要从医疗文本中抽取知识对构建医疗辅助诊断系统等应用具有重要意义。实体识别是其中的核心步骤。现有的实体识别模型大都是基于标注数据的深度学习模型,非常依赖高质量大规模的标注数据。为了充分利用已有的医疗领域词典和预训练语言模型,本文提出了融合知识的中文医疗实体识别模型。一方面基于领域词典提取领域知识,另一方面,引入预训练语言模型BERT作为通用知识,然后将领域知识和通用知识融入到模型中。此外,本文引入了卷积神经网络来提高模型的上下文建模能力。本文在多个数据集上进行实验,实验结果表明,将知识融合到模型中能够有效提高中文医疗实体识别的效果。 Extracting knowledge from medical texts is of great significance to the construction of medical auxiliary diagnosis system and other applications.Entity recognition is an important step.Most of the existing entity recognition models are based on the deep learning model of annotation data,which rely heavily on high-quality large-scale annotation data.In order to make full use of the existing medical dictionary and pre-training language model,this paper proposes a Chinese medical entity recognition model with knowledge fusion.On one hand,domain knowledge is extracted based on domain dictionary;on the other hand,the pretraining language model BERT is used as general knowledge,and then domain knowledge and general knowledge are integrated into the model.In addition,convolution neural network is introduced to improve the context modeling ability of the model.In this paper,experiments are carried out on multiple datasets.The experimental results show that knowledge fusion can effectively improve the effect of medical entity recognition.

作者刘龙航赵铁军 LIU Longhang;ZHAO Tiejun(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)

机构地区哈尔滨工业大学计算机科学与技术学院

出处《智能计算机与应用》 2021年第3期94-97,共4页 Intelligent Computer and Applications

关键词实体识别序列标注模型融合知识 entity recognition sequence labeling model knowledge fusion

分类号 R-05 [医药卫生] TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1叶枫,陈莺莺,周根贵,李昊旻,李莹.电子病历中命名实体的智能识别[J].中国生物医学工程学报,2011,30(2):256-262. 被引量：47
2王世昆,李绍滋,陈彤生.基于条件随机场的中医命名实体识别[J].厦门大学学报（自然科学版）,2009,48(3):359-364. 被引量：38

二级参考文献26

1俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量：160
2Burr Settles. Biomedical named entity recognition using conditional random fields and rich feature sets[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. Geneva, Switzerland ; COLING, 2004 : 104 -- 107.
3Hieuxuan. FlexCRFs, flexible conditional random fields [EB/OL]. http,//www, jaist, ae. jp. html.
4中国科学院计算技术研究所.汉语词法分析工具ICT-CLAS[EB/0L].http://www.nlp.org.cn/.
5Zhang Leo Maximum entropy modeling toolkit for python and C+ + [EB/OL]. 2007-07. http:Hhomepages, inf. ed. ac. uk/s0450736/maxent_toolkit, html.
6Chang Chihchung, Lin Chihjen. LIBSVM -- a library for support vector machines[EB/OL], http://www, csie.ntu. edu. tw/-cjlin/libsvm.
7Doan A,Naughton JF,Ramakrishnan R,et al.Information extraction challenges in managing unstructured data[J].ACM SIGMOD Record,2008,37(4):14-20.
8Vlachos A,Gasperin C.Bootstrapping and evaluating named entity recognition in the biomedical domain[C]//Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology.New York:Association for Computational Linguistics Morristown,2006:138-145.
9Bundschus M,Dejori M,Stetter M,et al.Extraction of semantic biomedical relations from text using conditional random fields[J].BMC Bioinformatics,2008,9:207.
10Leaman R,Gonzalez GR.BANNER:An executable survey of advances in biomedical named entity recognition[C]//Proceedings of Pacific Symposium on Biocomputing.Hawaii:World Scientific Publishing Co.Pte.Ltd,2008:652-663.

共引文献78

1姜会珍,胡海洋,马琏,赵从朴,张锋,陈婕卿,曾可,王晓露,朱卫国.基于医患对话的病历自动生成技术研究[J].中国数字医学,2021,16(10):36-40. 被引量：3
2肖瑞,胡冯菊,裴卫.基于BiLSTM-CRF的中医文本命名实体识别[J].世界科学技术-中医药现代化,2020,22(7):2504-2510. 被引量：33
3叶枫,陈莺莺,周根贵,李昊旻,李莹.电子病历中命名实体的智能识别[J].中国生物医学工程学报,2011,30(2):256-262. 被引量：47
4王若佳,赵常煜,王继民.中文电子病历的分词及实体识别研究[J].图书情报工作,2019,63(2):34-42. 被引量：19
5杨锦锋,于秋滨,关毅,蒋志鹏.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报,2014,40(8):1537-1562. 被引量：127
6孟洪宇,孟庆刚.基于条件随机场的中医术语抽取方法及其应用探析[J].中华中医药学刊,2014,32(10):2334-2337. 被引量：7
7胡秧.一种基于条件随机场的专利功效标注方法[J].计算机光盘软件与应用,2014,17(16):115-117.
8栗伟,赵大哲,李博,彭新茗,刘积仁.CRF与规则相结合的医学病历实体识别[J].计算机应用研究,2015,32(4):1082-1086. 被引量：44
9许华,刘茂福,姜丽,顾进广.基于语言规则的病症菌实体抽取[J].武汉大学学报（理学版）,2015,61(2):151-155. 被引量：8
10孟洪宇,谢晴宇,常虹,孟庆刚.基于条件随机场的《伤寒论》中医术语自动识别[J].北京中医药大学学报,2015,38(9):587-590. 被引量：15

1任平,雷浩.德国劳动教师教育课程体系:结构·特征·经验--以慕尼黑工业大学劳动教育专业为例[J].外国教育研究,2021,48(7):29-42. 被引量：10
2张虹,景欣,阮梦宇,周利军,王静,林春龙,林春哲.基于知识图谱的交直流大电网断面越限处置策略快速生成方法[J].现代电力,2021,38(4):455-464. 被引量：17

智能计算机与应用

2021年第3期

浏览历史

内容加载中请稍等...

融合知识的中文医疗实体识别模型

参考文献2

二级参考文献26

共引文献78

相关作者

相关机构

相关主题

浏览历史