摘要
从医疗文本中抽取知识对构建医疗辅助诊断系统等应用具有重要意义。实体识别是其中的核心步骤。现有的实体识别模型大都是基于标注数据的深度学习模型,非常依赖高质量大规模的标注数据。为了充分利用已有的医疗领域词典和预训练语言模型,本文提出了融合知识的中文医疗实体识别模型。一方面基于领域词典提取领域知识,另一方面,引入预训练语言模型BERT作为通用知识,然后将领域知识和通用知识融入到模型中。此外,本文引入了卷积神经网络来提高模型的上下文建模能力。本文在多个数据集上进行实验,实验结果表明,将知识融合到模型中能够有效提高中文医疗实体识别的效果。
Extracting knowledge from medical texts is of great significance to the construction of medical auxiliary diagnosis system and other applications.Entity recognition is an important step.Most of the existing entity recognition models are based on the deep learning model of annotation data,which rely heavily on high-quality large-scale annotation data.In order to make full use of the existing medical dictionary and pre-training language model,this paper proposes a Chinese medical entity recognition model with knowledge fusion.On one hand,domain knowledge is extracted based on domain dictionary;on the other hand,the pretraining language model BERT is used as general knowledge,and then domain knowledge and general knowledge are integrated into the model.In addition,convolution neural network is introduced to improve the context modeling ability of the model.In this paper,experiments are carried out on multiple datasets.The experimental results show that knowledge fusion can effectively improve the effect of medical entity recognition.
作者
刘龙航
赵铁军
LIU Longhang;ZHAO Tiejun(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2021年第3期94-97,共4页
Intelligent Computer and Applications
关键词
实体识别
序列标注模型
融合知识
entity recognition
sequence labeling model
knowledge fusion