摘要
命名实体识别是自然语言处理中的一项基础任务。通过基于词典的方法增强词内语义和词边界信息是中文命名实体识别的主流做法。然而,汉字由象形字演变而来,汉字字形中包含着丰富的实体信息,这些信息在该任务中却很少被使用。该文提出了一个基于词典和字形特征的中文命名实体识别模型,将词信息和结构信息统一地结合起来,提高了实体匹配的准确性。该文首先通过SoftLexicon方法丰富语义信息,并使用改进的部首级嵌入优化字符表示;然后通过门卷积网络加强了对潜在词和上下文信息的提取;最后在四个基准数据集上实验,结果表明与传统模型和最新模型相比,基于词典和字形特征的模型取得了显著的性能提升。
Named entity recognition is a fundamental task of natural language processing.Lexicon-based method is the popular approach to enhance the representation of semantic and boundary information for Chinese named entity recognition.To utilize the glyphs containing rich entity information,we propose a novel Chinese named entity recognition model based on lexicon and glyph features.Specifically,the model enriches the semantic information through SoftLexicon and optimizes character representation through the improved radical-level embedding,which is fed into gated convolutional network.The experiments on four benchmark datasets show that the proposed model achieves significant improvements compared to both the existing models.
作者
于舒娟
毛新涛
张昀
黄丽亚
YU Shujuan;MAO Xintao;ZHANG Yun;HUANG Liya(College of Electronic and Optical Engineering&College of Flexible Electronics(Future Technology),Nanjing University of Posts and Telecommunications,Nanjing,Jiangsu 210023,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第3期112-122,共11页
Journal of Chinese Information Processing
基金
国家自然科学基金(61977039)
关键词
中文命名实体识别
词典
字形特征
Chinese named entity recognition
lexicon
glyph features