摘要
针对检验检测领域存在的实体语料匮乏、实体嵌套严重、实体类型冗杂繁多等问题,提出了一种结合双向编码器表示法(BERT)预处理语言模型、双向门控循环单元(BIGRU)双向轻编码模型和随机条件场(CRF)的命名实体识别方法。BERT-BIGRU-CRF(BGC)模型首先利用BERT预处理模型结合上下文语义训练词向量;然后经过BIGRU层双向编码;最后在CRF层计算后输出最优结果。利用含有检测组织、检测项目、检测标准和检测仪器4种命名实体的检验检测领域数据集来训练模型,结果表明BGC模型的准确率、召回率和F1值都优于不加入BERT的对比模型。同时对比BERT-BILSTM-CRF模型,BGC模型在训练时间上缩短了6%。
Aiming at the problems of lack of entity corpus, serious nesting of entities, and multiple entity types in the field of inspection and detection, a named entity recognition method combining bidirectional encoder representation from transformers(BERT) preprocessing language model, bi-directional gate recurrent unit(BIGRU) bidirectional light coding model and random condition field(CRF) is proposed. The BERT-BIGRU-CRF(BGC) model first uses the BERT preprocessing model combined with contextual semantic training word vectors. Then it undergoes bidirectional encoding at the BIGRU layer. Finally it outputs the optimal result after calculation at the CRF layer. The model is trained by using the inspection and detection field data set containing four named entities of inspection organization, inspection items, inspection standards, and inspection instruments. The experimental results show that the accuracy, recall andF1 value of the BGC model are better than the comparison model without BERT. At the same time, compared with the BERT-BILSTM-CRF model, the BGC model shortens the training time by 6%.
作者
苏展鹏
李洋
张婷婷
让冉
张龙波
蔡红珍
邢林林
SU Zhanpeng;LI Yang;ZHANG Tingting;RANG Ran;ZHANG Longbo;CAI Hongzhen;XING Linlin(School of Agricultural Engineering and Food Science,Shandong University of Technology,Zibo 255000;School of Computer Science and Technology,Shandong University of Technology,Zibo 255000)
出处
《高技术通讯》
CAS
2022年第7期749-755,共7页
Chinese High Technology Letters
基金
国家重点研发计划(2018YFB1403302)资助项目。
关键词
命名实体识别
双向编码器表示法(BERT)
检验检测领域
深度学习
双向门控循环单元(BIGRU)
named entity recognition
bidirectional encoder representation from transformers(BERT)
inspection and detection field
deep learning
bi-directional gate recurrent unit(BIGRU)