摘要
由于中文语境的复杂性,存在语言边界不清晰、语境依赖、大量的近义词和一词多义等实体嵌套现象,直接套用英文语境中的先进模型效果不理想.针对中文医药词汇和语境的特点,在双向编码器表示预训练语言模型基础上引入自注意机制,结合BiLSTM+CRF模型进行中文命名实体识别,以增强词向量之间以及词向量内的字间关系.试验结果表明,本文模型在嵌套实体数据集上和非嵌套实体数据集上的F1值都较高,对中文医药语境具有较好的适应性.
Recently,the emergence of ChatGPT model,a knowledge graph-based Q&A system,has attracted widespread attention.However,Chinese-based automatic Q&A systems are still in their early stages.Due to the complexity of the Chinese context,characterized by unclear language boundaries,context dependence,numerous synonyms,and the nesting of entities with multiple meanings,directly applying advanced models from English language contexts may not yield satisfactory results.To address the characteristics of Chinese medical vocabulary and context,this paper introduces a self-attention mechanism to the BILSTM+CRF model for Chinese named entity recognition,based on the pre-trained language model of bidirectional encoder representation(BERT).In order to enhance the word vectors as well as the inter-word relations of the word vectors.Experimental results demonstrate that the proposed model achieves high F1 values on both nested entity datasets and non-nested entity datasets,showing good adaptability to the Chinese medical context.
作者
郑胜男
柳圣
鞠文慧
钱文泉
ZHENG Shengnan;LIU Sheng;JU Wenhui;QIAN Wenquan(College of Computer Science and Software Engineering,Hohai Univercity,Nanjing210024,China;School of Computer Engineering,Nanjing Institute of Technology,Nanjing 211167,China)
出处
《南京工程学院学报(自然科学版)》
2023年第4期37-40,共4页
Journal of Nanjing Institute of Technology(Natural Science Edition)
关键词
知识图谱
问答系统
实体抽取
医药信息
knowledge graph
question and answer system
entity extraction
medical information