摘要
提出了一种基于Transformer和隐马尔科夫模型的字级别中文命名实体识别方法。本文改进了Transformer模型的位置编码计算函数,使修改后的位置编码函数能表达字符之间的相对位置信息和方向性。使用Transformer模型编码后的字符序列计算转移矩阵和发射矩阵,建立隐马尔科夫模型生成一组命名实体软标签。将隐马尔科夫模型生成的软标签带入到Bert-NER模型中,使用散度损失函数更新Bert-NER模型参数,输出最终的命名实体强标签,从而找出命名实体。经过对比实验,本文方法在中文CLUENER-2020数据集和Weibo数据集上,F_(1)值达到75.11%和68%,提升了中文命名实体识别的效果。
A new method for Chinese named entity recognition at word level based on transformer and hidden Markov model is proposed.The position coding calculation function of transformer model is improved,so that the modified position coding function can express the relative position information and directivity between characters.The character sequence encoded by transformer model is used to calculate the transfer matrix and emission matrix,and a hidden Markov model is established to generate a group of named entity soft labels.The soft label generated by hidden Markov model is brought into Bert-NER model,the divergence loss function is used to update the parameters of Bert-NER model,and the final named entity strong label is output to find the named entity.Through comparative experiments,the F_(1) value of the proposed method in Chinese cluster-2020 data set and Weibo data set reaches 75.11%and68%,which improves the effect of Chinese named entity recognition.
作者
李健
熊琦
胡雅婷
刘孔宇
LI Jian;XIONG Qi;HU Ya-ting;LIU Kong-yu(College of Information Technology,Jilin Agricultural University,Changchun 130118,China;Jilin Bioinformatics Research Center,Changchun 130118,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2023年第5期1427-1434,共8页
Journal of Jilin University:Engineering and Technology Edition
基金
吉林省发改委产业技术研究与开发项目(2020C037-7)
吉林省科技发展计划项目(20230508026RC)
长春市科技发展计划项目(21ZGN26)。