摘要
中文由于词边界模糊,字符信息获取不足等问题,使得中文实体识别较为困难.论文针对汉字的象形文字特点,提出一种结合字形特征的增强字符信息算法,该算法利用卷积神经网络和BERT模型得到增强字符向量;同时提出多粒度融合嵌入算法,利用注意力机制将增强字符向量与词向量融合,最终构建出多粒度融合嵌入的中文实体识别模型.实验表明,该模型在中文实体识别中优于其它常用模型.
Chinese entity recognition is difficult due to fuzzy word boundary and insufficient character information acquisition.In view of the hieroglyphic character characteristics of Chinese characters,paper proposes an enhanced character information algorithm combined with glyph characteristics.This algorithm uses convolutional neural network and BERT model to obtain the enhanced character vector.At the same time,a multi-granularity fusion embedding algorithm is proposed,which uses the attention mechanism to fuse the enhanced character vector and word vector,and finally constructs the multi-granularity fusion embedding Chinese entity recognition model.Experiments show that this model is superior to other common models in Chinese entity recognition.
作者
袁健
章海波
YUAN Jian;ZHANG Hai-bo(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2022年第4期741-746,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61775139)资助。