摘要
在大规模文本语料库上预先训练的BERT等神经语言表示模型可以很好地从纯文本中捕获丰富的语义信息.但在进行中文命名实体识别任务时,由于中文命名实体存在结构复杂、形式多样、一词多义等问题,导致中文命名实体识别效果不佳.考虑到知识图谱可以提供丰富的结构化知识事实,从而更好地进行语言理解,提出了一种融合知识图谱信息的中文命名实体识别方法,通过知识图谱中的信息实体增强语言的外部知识表示能力.实验结果表明,与BILSTM-CRF、BERT等方法相比,所提出的方法有效提升了中文命名实体的识别效果,在MSRA与搜狐新闻网标注数据集上,F1值分别达到了95.4%与93.4%.
Neural language representation models such as BERT,which are pre-trained on large-scale text corpora,can well capture rich semantic information from plain text.However,due to the complex structure,various forms and polysemous meaning of the Chinese named entity,the Chinese named entity recognition results in poor performance.In view of the fact that knowledge graphs can provide rich structured knowledge facts for better language understanding,a Chinese named entity recognition method integrating knowledge graph information is proposed to enhance the language's external knowledge representation ability through the information entities in knowledge graphs.The experimental results show that compared with BILSTM-CRF,BERT and other methods,the proposed method effectively improves the recognition effect of Chinese named entities,and the F1 value reaches 95.4%and 93.4%respectively on MSRA and Sohu news-marking data sets.
作者
阎志刚
李成城
林民
YAN Zhi-gang;LI Cheng-cheng;LIN Min(College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,Inner Mongolia,China)
出处
《山西师范大学学报(自然科学版)》
2021年第1期51-58,共8页
Journal of Shanxi Normal University(Natural Science Edition)
基金
国家自然科学基金资助项目(61806103,61562068)
内蒙古自然科学基金项目(2017MS0607)
内蒙古民委蒙古文信息化专项扶持子项目(MW-2014-MGYWXXH-01)
内蒙古自治区“草原英才”工程青年创新创业人才项目
内蒙古师范大学研究生创新基金(CXJJS19151)
内蒙古自治区科技计划项目(JH20180175).
关键词
自然语言处理
中文命名实体识别
知识图谱
深度学习
知识嵌入
natural language processing
Chinese named entity recognition
knowledge graph
deep learning
knowledge embedding