摘要
命名实体识别是知识抽取中的重要任务之一,为了更有效地利用词典匹配信息,提出了基于匹配词权重优化的中文命名实体识别模型。首先利用与训练模型和分词工具获得每个字符的向量表示和词性标注;然后在词典中匹配潜在词组,跟据匹配词词频和文档计数的优化权重对词组加权,结合字符向量得到字符的多特征融合表示;最后使用双向长短期记忆网络(Bi-directional Long-Short Term Memory,Bi-LSTM)网络进行训练,使用条件随机场(Conditional Random Field,CRF)完成标签推理得到识别实体。试验结果表明,该模型在Resume和影视-音乐-书籍数据集上的F1值分别达到了95.55%和85.39%,有效地提高了中文命名实体识别效果。
Named entity recognition is one of the important tasks in knowledge extraction.In order to make more effective use of lexicon information,a Chinese named entity recognition model based on the matching word weight optimization is proposed.First the training model and word segmentation tool is used to obtain the vector representation and part-of-speech tagging of each character,then the potential phrase is matched in the dictionary,the phrase is weighted according to the optimized weight of the matched word frequency and document count,and the character vector is combined to obtain the multi-characteristics of the character Fusion representation.Finally,a Bi-directional Long-Short Term Memory(Bi-LSTM)network is used for training,and a Conditional Random Field(CRF)is used to complete label inference to obtain the identified entity.The test results show that the F1 value of this model the on the Resume and Movie-Music-Book datasets reaches 95.55%and 85.39%,respectively,which effectively improves the effect of Chinese named entity recognition.
作者
戴高阳
孟小艳
张容祯
陈燕红
汪洋
DAI Gaoyang;MENG Xiaoyan;ZHANG Rongzhen;CHEN Yanhong;WANG Yang(School of Computer and Information Engineering,Xinjiang Agricultural University,Urumqi 830052)
出处
《计算机与数字工程》
2024年第2期521-527,共7页
Computer & Digital Engineering
基金
新疆维吾尔自治区自然科学基金项目(编号:2019D01A50)
新疆维吾尔自治区重点研发项目(编号:2017B01006-1)资助。
关键词
命名实体识别
循环神经网络
条件随机场
词典匹配
权重优化
named entity recognition
recurrent neural network
conditional random field
dictionary matching
weight optimization