摘要
中文命名实体识别主要包括中文平面命名实体识别和中文嵌套命名实体识别两个任务,其中中文嵌套命名实体识别任务难度更大。提出了一个基于词汇增强和表格填充的统一模型TLEXNER,该模型能够同时处理上述任务。该模型首先针对中文语料分词困难的问题,使用词典适配器将词汇信息融合到BERT预训练模型,并且将字符与词汇组的相对位置信息集成到BERT的嵌入层中;然后通过条件层归一化和双仿射模型构造并预测字符对表格,使用表格建模字符与字符之间的关系,得到平面实体与嵌套实体的统一表示;最后根据字符对表格上三角区域的数值判断实体类别。提出的模型在平面实体的公开数据集Resume和自行标注的军事领域嵌套实体数据集上F1分别是97.35%和91.96%,证明了TLEXNER模型的有效性。
Chinese named entity recognition has been involved with two tasks,including Chinese flat named entity recognition and Chinese nested named entity recognition.Chinese nested named entity recognition is more difficult.Therefore,this paper proposes a unified model,namely TLEXNER,based on lexicon enhancement and table filling,which can tackle the above two tasks concurrently.Aiming at the difficulty of Chinese word segmentation,the lexicon adapter is used to integrate the lexicon information into the BERT pre-training model,and integrates the relative position information of characters and lexical groups into the BERT embedding layer.Then conditional layer normalization and biaffine model is used to build and predict the representation of the character-pair table,and the relationship between character pairs is modeled by table structure to obtain the unified representation of flat entities and nested entities.Finally,the entity category is determined according to the value in the upper triangle area of the character-pair table.This paper proposes that the model F1 is 97.35%and 91.96%on the flat entity dataset Resume and the self-labeled nested entity dataset in the military field,respectively,which proves the validity of TLEXNER model.
作者
褚天舒
唐球
梁军学
徐睿
王明阳
刘涛
Chu Tianshu;Tang Qiu;Liang Junxue;Xu Rui;Wang Mingyang;Liu Tao(National Computer System Engineering Research Institute of China,Beijing 100083,China;People′s Liberation Army 93216,Beijing 100085,China)
出处
《电子技术应用》
2024年第2期23-29,共7页
Application of Electronic Technique
关键词
词汇增强
中文命名实体识别
表格填充
lexicon enhancement
Chinese named entity recognition
table filling