摘要
针对基于字级别的命名实体识别方法无法充分利用句子词语信息的问题,提出一种融合词语信息的细粒度命名实体识别模型。该模型通过引入外部词典,在基于字表示中融入句子潜在词语的信息,避免了分词错误传播的问题,同时构建了一种增强型字向量表达;利用扁平化的Lattice Transformer网络结构对字和词语的表示以及位置关系信息进行建模;通过CRF(Conditional Random Filed)计算得到最优标签序列。在细粒度命名实体语料CLUENER2020上进行了实验,精确率达到82.46%,召回率达到83.14%,F1值达到82.80%,验证了融合词语信息可以提升细粒度命名实体识别效果。
Aiming at the problem that the character-level named entity recognition method cannot make full use of the words information in the sentence,we propose a fine-grained named entity recognition model that integrates words information.By introducing an external lexicon,the model incorporated the information of potential words in the character-based representation,avoiding the propagation of words segmentation error,and an enhanced word vector expression was constructed.Using the flat lattice transformer network structure to model the characters representation,the words representation and the position relationship information.The optimal tag sequence was calculated by conditional random filed(CRF).The experiments were conducted on the fine-grained named entity corpus CLUENER2020.The results show that the accuracy rate reaches 82.46%,the recall rate reaches 83.14%,and F1 value reaches 82.80%,which verifies that the fusion of word information can improve the effect of fine-grained named entity recognition.
作者
曹晖
徐杨
Cao Hui;Xu Yang(School of Big Data and Information Engineering,Guizhou University,Guiyang 550025,Guizhou,China;Guiyang Aluminum Magnesium Design&Research Institute Co.,Ltd.,Guiyang 550081,Guizhou,China)
出处
《计算机应用与软件》
北大核心
2023年第3期235-240,共6页
Computer Applications and Software
基金
贵州省科技计划项目(黔科合LH字[2016]7429号)
贵州大学引进人才项目(2015- 12)。