摘要
针对嵌套命名实体识别,神经网络模型中提出基于跨度的框架。该框架首先产生跨度种子,然后搭建分类器进行筛选。但单独对跨度区域进行分类存在丢失全局语义信息的问题。另外,在中文嵌套命名实体识别中,因为缺少分隔符且中文高度依赖上下文,跨度区域无法有效使用词边界特征,导致识别性能不佳。为解决上述问题,本文提出结合实体标签的中文嵌套命名实体识别模型(CEL)。该模型生成跨度种子后,在原句子的跨度区域开始及结束位置嵌入实体标签,再作为分类器输入,从而更好地学习到跨度种区域边界和上下文之间的语义依赖特征。论文在ACE2005中文数据集上进行实验,实验表明,CEL模型在F1值上达到了较好水平。
For nested named entity recognition,a span based framework is proposed in the neural network model.The frame⁃work first generates span seeds,and then builds classifiers to filter.However,there is a problem of losing global semantic informa⁃tion when classifying span regions alone.In addition,in Chinese nested named entity recognition,because of the lack of separators and the high dependence of Chinese on context,the word boundary feature can not be used effectively in the span region,resulting in poor recognition performance.In order to solve the above problems,this paper proposes a Chinese nested named entity recogni⁃tion model(CEL)combined with entity labels.After the model generates the span seeds,the entity labels is embedded at the begin⁃ning and end of the span region of the original sentence,and then used as the input of the classifier,so as to better learn the seman⁃tic dependency between the boundary and context of the span region.Experiments on ACE2005 Chinese dataset show that cel model achieves a good level in F1-score.
作者
潘丽君
陈艳平
黄瑞章
秦永彬
PAN Lijun;CHEN Yanping;HUANG Ruizhang;QIN Yongbin(School of Computer Science and Technology,Guizhou University,Guiyang 550025;Guizhou Provincial Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025)
出处
《计算机与数字工程》
2022年第7期1522-1527,1539,共7页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:62166007)资助。
关键词
嵌套命名实体识别
神经网络
跨度种子
全局语义信息
实体标签
nested named entity recognition
neural network
span seeds
global semantic information
entity labels