摘要
针对传统中文嵌套命名实体识别模型通常存在实体边界难以准确定位及中文字符与词汇之间边界模糊的问题,构建一种基于位置嵌入和多级结果边界预测的嵌套命名实体识别模型。在嵌入层,将嵌套实体位置信息与文本位置信息同时编码后生成绝对位置序列,通过关注中文文本中自带的位置信息,进一步挖掘嵌套实体与字符之间的关系,并且增强了嵌套实体与原始文本之间的联系。在编码层,利用排除最优路径的隐藏矩阵实现嵌套实体的初步识别。在解码层,计算实体边界的偏移量,重新确定实体边界,从而提高中文嵌套实体识别准确率。实验结果表明,在医疗和日常两个领域的数据集上,该模型的准确率、召回率、F1值相比于基线模型中的最优值分别提高了0.34、1.06、0.80和11.90、0.78、6.23个百分点,具有较好的识别性能。
Traditional Chinese nested Named Entity Recognition(NER)models often face problems,such as difficulty in accurately locating entity boundaries and blurred boundaries between Chinese characters and vocabulary.A nested NER model based on position embedding and multilevel result boundary prediction is proposed to address this problem.The position information of nested entities is encoded with the text position information in the embedding layer.An absolute position sequence is then generated,which further examines the relationship between the nested entities and characters and strengthens the connection between the nested entities and the original text by focusing on the position information in the Chinese text.At the encoding layer,the nested entities are initially identified using a hidden matrix that excludes the best path with multilevel prediction.At the decoding layer,the offset of entity boundaries is calculated at the multilevel prediction layer to redefine the entity boundaries,and improve the accuracy of Chinese entity prediction.The experimental results show that the proposed model improves the precision,recall,and F1-value by 0.34,1.06,and 0.80 percentage points,respectively,on the medical domain dataset,and by 11.90,0.78,and 6.23 percentage points,respectively,on the daily domain dataset compared to the highest value in the baseline models.This study demonstrates that the proposed model exhibits high performance in recognizing Chinese nested named entities.
作者
段建勇
朱奕霏
王昊
何丽
李欣
DUAN Jianyong;ZHU Yifei;WANG Hao;HE Li;LI Xin(School of Information,North China University of Technology,Beijing 100144,China;CNONIX National Standard Application and Promotion Laboratory,Beijing 100144,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2023年第12期71-77,共7页
Computer Engineering
基金
国家自然科学基金(61972003)
教育部人文社科基金(21YJA740052)
北京市教育委员会科学研究计划项目(KM202210009002)。
关键词
嵌套命名实体识别
位置嵌入
边界预测单元
条件随机场
多级预测
nested Named Entity Recognition(NER)
location embedding
Boundary Prediction Unit(BPU)
Conditional Random Field(CRF)
multilevel prediction