摘要
对于生鲜蛋供应链知识图谱构建过程中供应链领域实体名称多样、特征信息提取不充分的问题,提出了一种基于BERT-CRF模型(Bidirectional encoder representations from transformers-conditional random field)的命名实体识别方法。该方法使用BIO(Begin、Internal、Other)标记规则进行序列标注,以字向量和位置向量作为输入,通过BERT预训练模型提取输入序列全局特征,并在模型的末端添加CRF层引入硬约束,构建适合生鲜蛋供应链领域命名实体识别的模型框架。所提出的模型与其他3种命名实体识别模型在自建数据集上进行了对比实验,该数据集包含12810条文本语料数据,5大类21个小类。实验结果表明,本文模型取得了很好的结果,准确率、召回率和F1值分别达到91.82%、90.44%、91.01%,验证了本文模型优于其他3种模型。最后本文模型使用自建的食品领域菜谱数据集进行实验,结果表明模型具有一定的泛化能力。
Recognizing named entities from raw text is the first step to construct a fresh egg supply chain knowledge graph and support a variety of downstream natural language processing tasks.This task can sort out the information in the supply chain and provide a basis for food safety traceability.In the raw text of fresh egg supply chain,there were various types of entities,and feature information extraction was inefficient.In order to solve the problem of fast and accurate identification of the named entities which entity types were pre-defined,a bidirectional encoder representations from transformers-conditional random field(BERT-CRF)architecture was proposed to solve the task of named entity recognition(NER)in the area of fresh egg supply chain.In BERT-CRF architecture,begin,internal and other(BIO)labeling rule was used to label the sequence,and the concatenation of character vector and position vector was used as inputs.The pre-training language model(BERT)was used to obtain the global features of input sequence,and the CRF layer was added at the end of the model to introduce hard constraints.A comparative experiment was conducted with other three NER model on the self-constructed dataset that contained five categories and 21 subcategories.The result showed that the BERT-CRF model was superior to the others and reported a state-of-the-art performance.The precision,recall and F1-score were 91.82%,90.44%and 91.01%,respectively.Finally,through the comparative experiments with other self-constructed dataset(dish dataset),the results showed that the model had a certain generalization ability.
作者
刘新亮
张梦琪
谷情
任延昭
何东彬
高万林
LIU Xinliang;ZHANG Mengqi;GU Qing;REN Yanzhao;HE Dongbin;GAO Wanlin(College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China;National Engineering Laboratory for Agri-product Quality Traceability,Beijing Technology and Business University,Beijing 100048,China;Key Laboratory of Agricultural Informatization Standardization,Ministry of Agriculture and Rural Affairs,Beijing 100083,China)
出处
《农业机械学报》
EI
CAS
CSCD
北大核心
2021年第S01期519-525,共7页
Transactions of the Chinese Society for Agricultural Machinery
基金
北京市科委科技计划项目(Z191100008619007)
关键词
生鲜蛋供应链
命名实体识别
预训练模型
条件随机场
fresh egg supply chain
named entity recognition
pre-training model
conditional random field