期刊文献+

基于BERT的警情文本命名实体识别 被引量:42

Alarm text named entity recognition based on BERT
下载PDF
导出
摘要 针对警情领域关键实体信息难以识别的问题,提出一种基于BERT的神经网络模型BERT-BiLSTMAttention-CRF用于识别和提取相关命名实体,且针对不同案由设计了相应的实体标记注规范。该模型使用BERT预训练词向量代替传统Skip-gram和CBOW等方式训练的静态词向量,提升了词向量的表证能力,同时解决了中文语料采用字向量训练时词语边界的划分问题;还使用注意力机制改进经典的命名实体识别(NER)模型架构BiLSTM-CRF。BERT-BiLSTM-Attention-CRF模型在测试集上的准确率达91%,较CRF++的基准模型提高7%,也高于BiLSTM-CRF模型86%的准确率,其中相关人名、损失金额、处理方式等实体的F1值均高于0.87。 Aiming at the problem that the key entity information in the police field is difficult to recognize,a neural network model based on BERT(Bidirectional Encoder Representations from Transformers),namely BERT-BiLSTMAttention-CRF,was proposed to recognize and extract related named entities,in the meantime,the corresponding entity annotation specifications were designed for different cases.In the model,the BERT pre-trained word vectors were used to replace the word vectors trained by the traditional methods such as Skip-gram and Continuous Bag of Words(CBOW),improving the representation ability of the word vector and solving the problem of word boundary division in Chinese corpus trained by the character vectors.And the attention mechanism was used to improve the architecture of classical Named Entity Recognition(NER)model BiLSTM-CRF.BERT-BiLSTM-Attention-CRF model has an accuracy of 91%on the test set,which is 7%higher than that of CRF++Baseline,and 4%higher than that of BiLSTM-CRF model.The F1 values of the entities are all higher than 0.87.
作者 王月 王孟轩 张胜 杜渂 WANG Yue;WANG Mengxuan;ZHANG Sheng;DU Wen(DS Information Technology Company Limited,Shanghai 200032,China;The First Research Institute of Telecommunications Technology,Shanghai 200032,China)
出处 《计算机应用》 CSCD 北大核心 2020年第2期535-540,共6页 journal of Computer Applications
基金 上海市信息化发展(大数据发展)专项资金资助项目(201901043) 上海市产业转型升级专项资金(产业技术创新)资助项目(JJ-YJCX-01-18-3418)~~
关键词 警情文本 命名实体识别 预训练语言模型 标注规范 词向量 alarm text Named Entity Recognition(NER) pretraining language model annotation specification word vector
  • 相关文献

参考文献1

二级参考文献24

  • 1Sundheim B M. Named entity task definition, version 2.1. In:Proc. of the Sixth Message Understanding Conf. 1995. 319~332
  • 2Borthwick A. A Maximum Entropy Approach to Named Entity Recognition: [Ph. D]. New York University. Department of Computer Science, Courant Institute 1999
  • 3Humphreys K, Gaizauskas R, Azzam S, et al. Description of the LaSIE-Ⅱ system as used for MUC-7. In:Proc. of the 7th Message Understanding Conference (MUC-7), 1998
  • 4URL http://www. ltg. ed. ac. uk
  • 5Chen H H, Ding Y W, Tsai S C,et al. Description of the NTU System Used for MET2. In: Proc. of 7th Message Understanding Conference, 1998
  • 6Black W J, Rinaldi F,Mowatt D. Facile: Description of the NE System Used For MUC-7. In:Proc. of 7th Message Understanding Conf. 1998
  • 7Fukumoto J, Shimohata M, Masui F, Sasaki M. Oki Electric Industry: Description of the Oki System as Used for MET-2. In:Proc. of 7th Message Understanding Conf. 1998
  • 8Wu Youzheng, Zhao Jun,Xu Bo. Chinese Named Entity Recognition Combining a Statistical Model with Human Knowledge. The Workshop on Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models (ACL 2003), Sapporo, Japan, 2003. 65~72
  • 9Sun Jian, et al. Chinese Named Entity Identification Using Classbased Language Model. In: Proc. of the 19th Intl. Conf. on Computational Linguistics 2002
  • 10Zhou GuoDong, Su Jian. Named Entity Recognition using an HMM-based Chunk Tagger. In:Proc. of the 40th Annual Meeting of the ACL, Philadelphia, PA 2002. 473~480

共引文献64

同被引文献357

引证文献42

二级引证文献195

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部