摘要
为提高中文命名实体识别任务的识别率,提出了一种多准则融合模型.采用基于字的BERT语言模型作为语言信息特征提取层,将其接入多准则共享连接层和条件随机场(CRF)层,得到融合模型.建立大规模中文混合语料库,优化模型参数,使用单GPU设备完成BERT语言模型的预训练.将融合模型在MSRA-NER和RMRB-98-1实体标注集上进行独立训练和混合训练,得到各语料库独立的单准则中文命名实体识别模型和多准则融合中文命名实体识别模型.结果表明,多准则融合中文命名实体识别模型能够挖掘语料库间的共有信息,提高中文命名实体的识别率,MSRA-NER和RMRB-98-1实体标注集上的F1值分别为94.46%和94.32%,优于其他现有模型.
To improve the recognition rate of Chinese named entity recognition tasks,a multi-criteria fusion model was proposed.The word-based BERT(bidirectional encoder representations from transformers)language model was used as the language information feature extraction layer,and connected to the multi-criteria shared connection layer and the conditional random field(CRF)layer to obtain the fusion model.Then,a large-scale Chinese mixed corpus was established and the model parameters were optimized.A single GPU(graphics processing unit)device was used to complete the pre-training of the BERT language model.Independent and hybrid training of the fusion model on MSRA-NER and RMRB-98-1 entity annotation sets were carried out to obtain the independent single-criteria Chinese named entity recognition model and the multi-criteria fusion Chinese named entity recognition model for each corpus.The results show that the multi-criteria fusion Chinese named entity recognition model can mine common information between corpora and improve the recognition rate of Chinese named entities.The F1 values on MSRA-NER and RMRB-98-1 entity tagging sets are 94.46%and 94.32%,respectively,which are better than those of other models.
作者
蔡庆
Cai Qing(Jiangsu Institute of Automation, Lianyungang 222061, China)
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2020年第5期929-934,共6页
Journal of Southeast University:Natural Science Edition
基金
“十三五”装备预研共用技术和领域基金资助项目(41412030902).
关键词
命名实体识别
BERT
条件随机场
多准则学习
named entity recognition
bidirectional encoder representations from transformers(BERT)
conditional random field(CRF)
multi-criteria learning