摘要
【目的/意义】命名实体识别(NER)作为医疗记录处理的核心组成部分,对于提高电子病历处理的准确性和效率至关重要。尤其是在处理中文病历这一领域,由于中文的复杂性,NER任务面临更多挑战。因此,开发一种有效的中文病历命名实体识别模型,对于改进医疗记录的信息提取和数据处理流程具有重要价值。【方法/过程】文中提出了一个新型框架NER-CMR(中文病历命名实体识别),旨在克服现有NER方法在中文病历中的限制。NER-CMR框架通过结合流行的连续词和短语等上下文信息,解决传统NER中实体词嵌套和边界识别的问题。具体来说,该框架从相关词和短语中提取字符间的邻接、共现和依赖关系,这些信息随后被融合到NER神经模型中。NER-CMR包含字符编码模块、词嵌入模块、图形构建模块、融合模块和CRF模块。【结果/结论】通过在CCKS这个广泛使用的中文病历数据集与DIABETES真实糖尿病中文数据集上进行综合实验,NER-CMR展示了其在识别性能上优于基线模型的能力。此外,该模型作为一个引入图神经网络的中文NER任务处理框架,具有模块替换的灵活性,为中文电子病历命名实体识别研究领域提供了新的发展方向。【创新/局限】提出了基于图注意力机制的网络图,设计了融合层实现多图融合处理,进一步利用两种策略来应对不正确关系带来的噪音问题,但缺乏智慧医疗系统应用层面的实例研究。
【Purpose/significance】Named entity recognition(NER),as a core component of medical record processing,is crucial to im⁃proving the accuracy and efficiency of electronic medical record processing.Especially in the field of processing Chinese medical re⁃cords,NER tasks face more challenges due to the complexity of Chinese.Therefore,developing an effective named entity recognition model for Chinese medical records is of great value for improving the information extraction and data processing process of medical re⁃cords.【Method/process】A novel framework NER-CMR(Chinese Medical Records Named Entity Recognition)is proposed,aiming to overcome the limitations of existing NER methods in Chinese medical records.The NER-CMR framework solves the problems of en⁃tity word nesting and boundary identification in traditional NER by combining contextual information such as popular continuous words and phrases.Specifically,the framework extracts adjacencies,co-occurrences,and dependencies between characters from re⁃lated words and phrases,and this information is subsequently fused into the NER neural model.NER-CMR includes character encod⁃ing module,word embedding module,graph building module,fusion module and CRF module.【Result/conclusion】Through compre⁃hensive experiments on CCKS,a widely used Chinese medical record dataset,and DIABETES real diabetes Chinese dataset,NERCMR demonstrated its ability to outperform the baseline model in recognition performance.In addition,as a Chinese NER task pro⁃cessing framework that introduces graph neural networks,this model has the flexibility of module replacement,providing a new devel⁃opment direction for the research field of named entity recognition in Chinese electronic medical records.【Innovation/limitation】A network graph based on graph attention mechanism is proposed,a fusion layer is designed to realize multi-graph fusion processing,and two strategies are further used to deal with the noise problem caused by incorrect relationships,but there is a lack of case studies on the application level of smart medical system.
作者
单涛
吴杰
景慎旗
叶继元
刘云
郭永安
SHAN Tao;WU Jie;JING Shenqi;YE Jiyuan;LIU Yun;GUO Yong'an(School of Information Management,Nanjing University,Nanjing 210023,China;Jiangsu Province Hospital,Nanjing 210096,China;College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《情报科学》
北大核心
2024年第3期100-109,117,共11页
Information Science
基金
江苏省前沿引领技术基础研究专项(BK20202001)
江苏省重点研发计划(SJ221007)。
关键词
命名实体识别
中文病历
邻接图
注意力机制
知识图谱
named entity recognition
Chinese medical record
adjacency graph
attention mechanism
knowledge graph