摘要
命名实体是电子病历中相关医学知识的主要载体,因此,临床命名实体识别(Clinical Named Entity Recognition,CNER)也就成为了临床文本分析处理的基础性任务之一.由于文本结构和语言等方面的特殊性,面向中文电子病历(Electronic Medical Records,EMRs)的临床命名实体识别依然存在着巨大的挑战.本文提出了一种基于多头自注意力神经网络的中文临床命名实体识别方法.该方法使用了一种新颖的融合领域词典的字符级特征表示方法,并在BiLSTM-CRF模型的基础上,结合多头自注意力机制来准确地捕获字符间潜在的依赖权重、语境和语义关联等多方面的特征,从而有效地提升了中文临床命名实体的识别能力.实验结果表明本文方法超过现有的其他方法获得了较优的识别性能.
Named entity is the main carrier of relevant medical knowledge in Electronic Medical Records(EMRs),so clinical named entity recognition(CNER)has become one of the basic and crucial tasks of clinical text analysis and processing.Due to the particularity of medical text structure and Chinese language,the recognition of clinical named entities for Chinese EMRs still faces great challenges.In this paper,a Chinese clinical named entity recognition method based on multi-head self-attention neural network is proposed.In this method,a character-level feature representation method combined with a domain dictionary is presented.Moreover,based on the BiLSTM-CRF model,a multi-head self-attention mechanism is incorporated to accurately capture the multiple features from different aspects,such as dependency weights between characters and contextual semantic relationships,thereby effectively improving the ability of Chinese clinical named entity recognition.Experimental results demonstrate that the proposed method outperforms other existing methods and has the best recognition performance.
作者
罗熹
夏先运
安莹
陈先来
LUO Xi;XIA Xianyun;AN Ying;CHEN Xianlai(Big Data Institute,Central South University,Changsha 410083,China;Key Laboratory of Network Crime Investigation of Hunan Provincial Colleges,Hunan Police Academy,Changsha 410138,China)
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2021年第4期45-55,共11页
Journal of Hunan University:Natural Sciences
基金
湖南省自然科学基金资助项目(2018JJ2534)
网络犯罪侦查湖南省普通高校重点实验室开放基金资助项目(2020WLFZZC003)
国家重点研发计划资助项目(2016YFC0901705)
湖南省重大科技专项(2017SK1040)
Natural Science Foundation of Hunan Province(2018JJ2534)
高新技术产业科技创新引领计划(2020GK2029)
关键词
中文电子病历
命名实体识别
长短期记忆
多头自注意力
Chinese electronic medical record
named entity recognition
long short-term memory
multi-head self-attention