摘要
枸杞作为宁夏重要的经济作物之一,被多种虫害寄生且产量极易受到影响,因此提高枸杞虫害防治能力对于稳固当地经济具有十分重要的意义。在大数据时代背景下,图像、文本等多模态数据爆发式增长,传统的单模态检索已不能满足人们多元化信息检索的需求。跨模态检索针对图文等多模态数据进行互检,更符合人们全面灵活的信息检索要求,构建可供跨模态检索技术使用的相关数据集对于农作物虫害防治具有非常重要的实际意义。为此我们构建了枸杞虫害图文跨模态检索数据集,包含枸杞虫害图像采集、文本撰写、数据增广、分类、图文对应等多方面内容。本数据集有17类常见枸杞病虫害图像文本数据共492 MB,其中虫害图像数据及与其对应的文本描述文件各为9496份,数据充足可供图文跨模态检索训练样本使用。本数据集将为枸杞虫害跨模态检索提供宝贵的基础数据资源,同时可作为农业领域大数据环境下机器学习的标准数据集,对促进跨模态在农业领域的发展和枸杞虫害防治研究,提高枸杞产量具有重要的实际应用价值。
Lycium barbarum,one of the major cash crops in Ningxia,is susceptible to pests.As a result,it is of great significance to improving the pest control capability of Lycium barbarum for the stability of the local economy.As a result of the explosive growth of multimodal data(e.g.images and texts)in the new era of big data,traditional single-mode retrieval cannot meet the needs of diverse information retrieval.Crossmodal retrieval for multimodal data is better to cater to the need for comprehensive and flexible information retrieval.Building relevant datasets for cross-modal retrieval technology is critical for crop pest control.As a result,we created a retrieval dataset of Lycium barbarum pests,including image acquisition,text writing,data augmentation,classification,image-text correspondence,and so on.This dataset contains 492 MB of image and text data of 17 common diseases and insect pests of Lycium barbarum,with 9,496 entries pest images and 9,496 corresponding text description files.The sufficient data in the dataset can be used for the training of cross-modal retrieval samples.This dataset offers valuable basic data resources for the crossmodal retrieval of Lycium barbarum pests and can be used as a standard for machine learning under the background of big data in agriculture.It has significant practical application value for promoting cross-modal development in agriculture,as well as the research on the prevention and control of Lycium barbarum pests,as well as improving Lycium barbarum yield.
作者
陈磊
刘立波
王晓丽
CHEN Lei;LIU Libo;WANG Xiaoli(School of Information Engineering,Ningxia University,Yinchuan 750021,P.R.China;Agricultural Information Institute of CAAS,Beijing 100081,P.R.China;National Nanfan Research Institute(Sanya),Chinese Academy of Agricultural Sciences,Sanya 572024,P.R.China;National Agriculture Science Data Center,Beijing 100081,P.R.China)
基金
国家自然科学基金(61862050)
农业基础性长期性科学数据治理与挖掘平台构建,中国农业科学院院级基本科研业务费(Y2022LM20)。
关键词
枸杞虫害
跨模态检索
训练样本
标准图文库
Lycium barbarum pests
cross-modal retrieval
training samples
standard graphic library