期刊文献+

基于文本数据增强的中文水稻育种问句命名实体识别

Named Entity Recognition in Chinese Rice Breeding Questions Based on Text Data Augmentation
下载PDF
导出
摘要 针对现有水稻育种问答系统存在数据管理水平低、知识粒度大,水稻育种领域缺乏用于命名实体识别的标注数据、人工标注成本高等问题,提出了一种基于文本数据增强的方法来识别水稻育种问句的命名实体,通过构建水稻育种知识图谱,对水稻育种问句中的大类命名实体进行分类,从而增强实体边界,降低知识粒度。针对水稻育种数据标注成本高导致命名实体识别性能不佳的难点,通过在BERT-BILSTM-CRF模型中引入数据增强层,提出了DA-BERT-BILSTM-CRF模型。实验以标注的水稻育种问句为训练数据,将所提出的模型与其他基线模型进行比较。结果表明,本文方法在水稻育种问句中命名实体识别的单类别识别任务和整体识别任务上均优于其他方法,其中单类别识别精确率达到94.26%,F1值达到93.32%;整体识别精确率达到93.86%,F1值达到93.34%。 Issues of low-level data management and high knowledge granularity exist in current rice breeding question answering systems.In addition,there is a lack of publicly available labeled data for named entity recognition in rice breeding,and manual annotation can be costly.To address these issues,an approach based on text data augmentation to the named entity recognition was proposed for rice breeding questions.The rice breeding knowledge graph was created to assist in subdividing larger named entity categories in rice breeding,such as rice characteristics entities,into smaller subcategories,such as resistance to abiotic stress and eating quality.It helped to enhance entity boundaries and reduce knowledge granularity.Responding to the challenge of high annotation costs for rice breeding data that results in suboptimal performance in named entity recognition,the DA-BERT-BILSTM-CRF model was presented by introducing a data augmentation layer into the BERT-BILSTM-CRF model.Using manually labeled rice breeding questions as training data,the proposed model was compared with three other baseline models.In the overall named entity recognition experiment under the small class entity division,the model achieved a precision of 93.86%,a recall of 92.82%,and an F1 score of 93.34%.Compared with the best-performing BERT-BILSTM-CRF model among the three baseline models,the model outperformed by 4.98,5.3 and 5.15 percentages points,respectively.Meanwhile,it also performed better in the single-entity recognition metric,achieving a precision of 94.26%and an F1 score of 93.32%.The experiments showed that the proposed approach performed better in both overall named entity recognition and single-class named entity recognition tasks in rice breeding questions.
作者 牛培宇 侯琛 NIU Peiyu;HOU Chen(College of Information and Electrical Engineering,China Agricultural University,Bejing 100083,China;National Engineering Laboratory for Big Data Analysis and Applications,Peking University,Beijing 100871,China;PKU-Changsha Institute for Computing and Digital Economy,Changsha 410205,China)
出处 《农业机械学报》 EI CAS CSCD 北大核心 2024年第8期333-343,共11页 Transactions of the Chinese Society for Agricultural Machinery
基金 国家自然科学基金项目(62303472)。
关键词 水稻育种 问答系统 命名实体识别 文本数据增强 知识图谱 rice breeding question answering system namedentityy recognition text data augmentation knowledge graph
  • 相关文献

参考文献13

二级参考文献298

共引文献647

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部