期刊文献+

基于ResNet18特征编码器的水稻病虫害图像描述生成 被引量:7

Generating image description of rice pests and diseases using a ResNet18 feature encoder
下载PDF
导出
摘要 针对图像描述算法缺乏在农业领域中的应用,传统模型参数庞大的问题,该研究提出一种基于ResNet18特征编码器的图像描述算法,对作物患病类型进行识别并生成描述。首先,建立水稻病虫害图像描述数据集。其次,使用浅层ResNet18作为编码器,在保证特征提取能力的同时缩减网络模型大小,解码器使用融合了注意力机制的长短期记忆网络(Long Short Term Memory,LSTM)来生成图像描述。试验结果表明,改进后模型尺寸大小为原来的1/3,经过6000次迭代后模型基本收敛,准确率达到98.48%。在水稻病虫害图像描述数据集上,改进编码器-解码器结构后的双语评估替换值(Bilingual Evaluation Understudy,BLEU)和METEOR(Metric for Evaluation of Translation with Explicit ORdering)分别达到0.752和0.404,其余指标结果也明显优于其他模型,具有描述细致准确、鲁棒性强等优点,能够更好地适用于小规模数据集上的训练,可为农作物相似病害特征的自动化描述提供有益参考。 Pests and diseases have posed a serious threat to the agricultural production and crops yields.Image description of agricultural pests and diseases can greatly contribute to the intelligent monitoring and diagnosis of crop health.However,the current models of target detection cannot generate the descriptions that related to the content of the image,although the class and location can be identified in recent years.A large number of parameters in the models can also be a great challenge on the edge computing platforms under the practical working scenarios.In this study,an image description model was designed using an encoder-decoder structure,in order to bridge the gap between visual features and text semantics.Firstly,10 common images of rice pests and diseases were collected by a web crawler,which was acquired the training image data in a short time.Secondly,the original data was expanded to produce more samples than before using the luminance adjustment,horizontal or vertical flipping,and Gaussian noise.After data enhancement,each image was manually tagged with the five English sentences(including the descriptions of pests and diseases characteristics),and then stored in the JSON format.As such,the training,validation,and test data were into divided 1793,222,and 223 images,respectively.Finally,the encoder-decoder structure was successfully introduced using the local perception and parameter sharing of Convolutional Neural Networks(CNN).The shallow network of ResNet18 was designed as the encoder to automatically extract the image features,whereas,the decoder was used a Long Short Term Memory network(LSTM)when incorporating an attention mechanism to generate the image descriptions.The LSTM performed better on the task of time-series,compared with the CNN,in order to deal with the long-term dependency of Recurrent Neural Network(RNN),and then alleviate the gradient disappearance and explosion during the long-sequence training.The datasets were then trained on the traditional model of CNN-LSTM,the attention model of Att_CNN-LSTM,and AdtAtt_CNN-LSTM with the introduction of visual sentinels.The experimental results showed that the Att_CNN-LSTM model using the ResNet18 feature encoder performed the best under the same metrics,where the Bilingual Evaluation Understudy(BLEU),Recall-Oriented Understudy for Gisting Evaluation(GOUGE-L),and the Metric for Evaluation of Translation with Explicit ORdering(METEOR)reached 0.752,0.657 and 0.404,respectively.The size of model was compressed by nearly 3 times than before,without loss in the capability of feature extraction.The model was basically converged after 6000 iterations,and significantly faster than that without the imposed attention mechanism,with the Top5 accuracy of 98.48%and loss of 0.813.Anyway,the evaluation metrics of the models were all improved significantly.The most outstanding model was the Consensus-based Image Description Evaluation(CIDEr),which reached the value of 1.623,and nearly 3 times better on the rice pests and diseases datasets,compared with the CNN-LSTM model.Overall,the image description generated by the Att_CNN-LSTM model can be expected to describe the image information in detail.By contrast,the CNN-LSTM without the attention model can only be used to describe the color features of the diseases.The improved model can be used to more accurately diagnose the diseases category and the supplementary features,such as the location of diseases.223 test datasets together with 268 actual images were verified simultaneously to visualize the partial detection.Consequently,the improved model can be used to discriminate and describe the correlation between the diseases,fully meeting the training on the small-scale datasets with the high accuracy.The finding can also provide a strong reference to automatically describe the similar pests and diseases of crops.
作者 谢州益 冯亚枝 胡彦蓉 刘洪久 Xie Zhouyi;Feng Yazhi;Hu Yanrong;Liu Hongjiu(School of Mathematics and Computer Science,Zhejiang A&F University,Hangzhou 311300,China;Zhejiang Key Laboratory of Intelligent Forestry Monitoring and Information Technology,Hangzhou 311300,China;Key Laboratory of Forestry Sensing Technology and Intelligent Equipment,State Forestry and Grassland Bureau,Hangzhou 311300,China)
出处 《农业工程学报》 EI CAS CSCD 北大核心 2022年第12期197-206,共10页 Transactions of the Chinese Society of Agricultural Engineering
基金 教育部人文社会科学研究规划基金项目(18YJA630037 21YJA630054) 浙江省自然科学基金资助项目(LY18G010005)。
关键词 农业 算法 图像描述 水稻病虫害 编码器-解码器框架 ResNet18 注意力机制 agriculture algorithm image description rice pests and diseases encoder-decoder framework ResNet18 attention mechanism
  • 相关文献

参考文献8

二级参考文献34

共引文献54

同被引文献90

引证文献7

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部