摘要
水利领域命名实体识别对水利知识图谱构建、水利智能问答系统构建等具有重要意义,但当前水利领域命名实体识别存在缺乏标注语料、传统方法识别精度低和无法解决多义实体等不足。针对水利文本特点,提出基于数据(词汇和实体类型标签)增强的机器阅读理解(MRC)命名实体识别模型,即MRC-WLE模型,主要是将水利文本中词汇特征信息和实体类型标签特征信息作为“知识”注入模型。引入BERT-CRF、BERT-CRF-Word、BERT-BiLSTM-CRF、BERT-BiLSTM-CRF-Word等模型作为对照,评价MRC-WLE模型的性能。结果表明:与上述BERT-CRF等模型相比,MRC-WLE模型的微平均F1值均有所提高。与MRC模型相比,MRC-WLE模型的微平均F1值提高了0.85%,体现了数据增强的有效性。
The recognition of named entities in the field of water conservancy is of great significance for the building of water conservancy knowledge graphs and intelligent question answering systems.However,in the current field of water conservancy,there are shortcomings in named entity recognition,such as a lack of annotated corpus,low recognition accuracy of traditional methods and inability to solve polyse⁃mous entities.Aiming at the characteristics of water conservancy texts,a Named Entity Recognition Model for Machine Reading Comprehen⁃sion(MRC)based on data(vocabulary and entity type labels)enhancement,namely the MRC⁃WLE model was put forward.Mainly,the vo⁃cabulary feature information and entity type label feature information in water conservancy texts were injected into the model as“knowledge”.It introduced models such as BERT⁃CRF,BERT⁃CRF⁃Word,BERT⁃BiLSTM⁃CRF and BERT⁃BiLSTM⁃CRF⁃Word as controls to evaluate the performance of the MRC⁃WLE model.The results show that compared with the BERT⁃CRF and other models mentioned above,the micro av⁃erage F1 value of the MRC⁃WLE model has been improved.Compared with the MRC model,the micro average F1 value of the MRC⁃WLE model has been increased by 0.85%,reflecting the effectiveness of data augmentation.
作者
朱永明
邢丹艳
ZHU Yongming;XING Danyan(College of Management,Zhengzhou University,Zhengzhou 450001,China)
出处
《人民黄河》
CAS
北大核心
2024年第9期156-160,共5页
Yellow River
基金
教育部人文社会科学研究一般项目(20YJA630101)
中国学位与研究生教育学会重大课题(2020ZDB20)。