摘要
针对猕猴桃种植领域命名实体识别任务中实体词复杂度较高,识别精确率较低的问题,提出一种融合字词语义信息的猕猴桃种植实体识别方法。以BiGRU-CRF为基本模型,融合词级别和字符级别的信息。在词级别上,通过引入词集信息,并使用多头自注意力(Multiple self-attention mechanisms,MHA)调整词集中不同词的权重;同时使用注意力机制忽略不可靠的词集,将注意力集中在重要的词集上,从而提高实体识别效果;在字符级别上,引入无监督的基于转换器的双向编码表征(Bidirectional encoder representations form transformers,BERT)预训练模型增强字的语义表示。在包含12477条标注样本和7个类别实体的猕猴桃种植领域自制语料上进行了实验,结果表明,本文模型与SoftLexicon模型相比,F1值提高1.58个百分点。此外,本文模型在公开数据集ResumeNER上与Lattice-LSTM、WC-LSTM等模型进行实验对比取得了最佳效果,F1值达到96.17%,表明本文模型具有一定的泛化能力。
Aiming at the problem of high complexity of real words and low recognition accuracy in the named entity recognition task of kiwifruit planting field,a entity recognition method of kiwifruit planting integrating character and word information was proposed.Based on BiGRU-CRF model,word level and character level information were fused.At the word level,by introducing word set information and using multiple self-attention mechanisms(MHA)to adjust the weights of different words in the word set.At the same time,attention mechanism was used to ignore the unreliable word sets and focus on the important word sets to improve the entity recognition effect.At the character level,the unsupervised bidirectional encoder representations form transformers(BERT)pre-training model was introduced to enhance the semantic representation of words.Experiments were conducted on a homemade corpus in the kiwifruit cultivation domain containing 12477 annotated samples and seven categories of entities,and the results showed that the F1 value of the model was improved by 1.58 percentage points compared with the SoftLexicon model.In addition,the experimental comparison of the model ResumeNER with Lattice-LSTM,WC-LSTM and other models in the open data set ResumeNER was carried out,and the best recognition effect was achieved.The F1 value reached 96.17%,indicating that the method proposed had certain generalization ability.
作者
李书琴
张明美
刘斌
LI Shuqin;ZHANG Mingmei;LIU Bin(College of Information Engineering,Northwest A&F University,Yangling,Shaanxi 712100,China)
出处
《农业机械学报》
EI
CAS
CSCD
北大核心
2022年第12期323-331,共9页
Transactions of the Chinese Society for Agricultural Machinery
基金
国家重点研发计划项目(2020YFD1100601)
陕西省重点研发计划项目(2021NY-138)
中央高校基本科研业务专项资金项目(2452019064)
关键词
猕猴桃种植
命名实体识别
字词融合
语义增强
自注意力机制
预训练语言模型
kiwifruit planting
named entity recognition
word fusion
semantic enhancement
self-attention mechanisim
pre-trained language model