摘要
典型文物的命名实体识别主要从句子中提取出文物名称、朝代、出土地点、馆藏地等类别的实体。典型文物数据具有构词的特殊性,使用现有命名实体识别方法在典型文物数据集上会遇到词边界判断错误等问题。本文提出了一种基于词汇增强的典型文物命名实体识别算法,算法在输入表示层和上下文编码层引入词汇信息,提高了词语领域专业性。算法通过构建文物领域词库,将其作为基于词汇增强的典型文物命名实体识别算法词典,较好地解决了词边界判断错误问题,在典型文物数据集上取得了较好的效果。
Named entity recognition of typical cultural relics focuses on extracting entities from sentences in categories such as name of cultural relic,dynasty,excavation site,and place of collection.The data of typical cultural relics has the specificity of word construction,and using existing named entity recognition methods on typical cultural relics dataset will encounter problems such as wrong word boundary judgments.The algorithm introduces lexical information in both the input representation layer and the contextual encoding layer to improve the word domain expertise.By constructing a lexicon of heritage domain words,the algorithm is used as a lexicon for the lexically enhanced recognition algorithm of typical heritage named entities,which eventually solves the problem of incorrect word boundary judgement and achieves better results on the typical heritage dataset.
作者
崔鑫
王琰
侯小刚
周月
CUI Xin;WANG Yan;HOU Xiaogang;ZHOU Yue(Beijing University of Posts and Telecommunications,Beijing 100876)
出处
《中国传媒大学学报(自然科学版)》
2023年第2期51-55,共5页
Journal of Communication University of China:Science and Technology
基金
国家重点研发计划课题“文化资源大数据服务工程方法与数据加工技术研究”(2021TFF0901701)。
关键词
词汇增强
领域词库
命名实体识别
lexicon enhanced
domain thesaurus
named entity recognition