摘要
命名实体识别(NER)是自然语言处理中一项非常重要的基础任务。传统的机器学习方法在处理该任务时,主要依赖于人们的专业领域知识和人工提取的特征。为了在不需要人工特征的条件下获得较好的结果,该文提出了一种融合字词BiLSTM模型的命名实体识别方法。首先分别用BiLSTM-CRF训练得到基于字的模型Char-NER和基于词的模型Word-NER,然后将两个模型得到的分值向量进行运算和拼接,将拼接后的向量作为特征送入SVM进行训练,使用SVM对Char-NER和Word-NER进行模型融合。实验结果表明,该方法在不需要人工特征的条件下,在1998年《人民日报》语料和MSRA语料上对人名、地名、机构名识别的F值分别达到了94.04%、92.15%、87.05%和91.73%、93.20%、83.15%。
Named Entity Recognition(NER)plays an important role in Natural Language Processing.In order to obtain better results without manual features,this paper proposes an NER method based on an ensemble model of BiLSTM.Firstly,we apply the BiLSTM-CRF training on the data,obtaining the character-based model Char-NER and the word-based model Word-NER respectively.Then the score vectors obtained by the two models are merged as the input to the SVM model.The experimental results show that this method achieves 94.04%,92.15%,87.05%and91.73%,93.20%,83.15% F-Scores of name,location and organization on the 1998 people’s daily and MSRA corpus respectively without hand-crafted features.
作者
殷章志
李欣子
黄德根
李玖一
YIN Zhangzhi;LI Xinzi;HUANG Degen;LI Jiuyi(School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China)
出处
《中文信息学报》
CSCD
北大核心
2019年第11期95-100,106,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61672127)