期刊文献+

多尺度拼图重构网络的食品图像识别 被引量:2

Food Image Recognition via Multi-scale Jigsaw and Reconstruction Network
下载PDF
导出
摘要 近年来,食品图像识别由于在健康饮食管理、无人餐厅等领域的广泛应用而受到了越来越多的关注.不同于其他物体识别任务,食品图像属于细粒度图像,具有较高的类内差异性和类间相似性,而且食品图像没有固定的语义模式和空间布局,这些特点使得食品图像识别更具挑战性.为此,提出了一种用于食品图像识别的多尺度拼图重构网络(multi-scale jigsaw and reconstruction network,MJR-Net).MJR-Net由拼图重构模块、特征金字塔模块和通道注意力模块这3部分组成.拼图重构模块使用破坏重构学习方法将原始图像进行破坏和重构,以提取局部的判别性细节特征;特征金字塔模块可以融合不同尺寸的中层特征,以捕获多尺度的局部判别性特征;通道注意力模块对不同特征通道的重要程度进行建模,以增强判别性的视觉模式,减弱噪声干扰.此外,还使用A-softmax和Focal损失,分别从增大类间差异和修正分类样本的角度优化网络.MJR-Net在ETH Food-101,Vireo Food-172和ISIA Food-500这3个食品数据集上进行实验,分别取得了90.82%,91.37%和64.95%的识别准确率.实验结果表明,与其他食品图像识别方法相比,MJR-Net表现出较大的竞争力,并在Vireo Food-172和ISIA Food-500上取得了最优识别性能.全面的消融实验和可视化分析证明了该方法的有效性. Recently,food image recognition has received more and more attention for its wide applications in healthy diet management,smart restaurant,and so on.Unlike other object recognition tasks,food images belong to fine-grained ones with high intra-class variability and inter-class similarity.Furthermore,food images do not have fixed semantic patterns and specific spatial layout.These make food recognition more challenging.This study proposes a multi-scale jigsaw and reconstruction network(MJR-Net)for food recognition.MJR-Net is composed of three parts.The jigsaw and reconstruction module uses a method called destruction and reconstruction learning to destroy and reconstruct the original image to extract local discriminative details.Feature pyramid module can fuse mid-level features of different sizes to capture multi-scale local discriminative features.Channel-wise attention module can model the importance of different feature channels to enhance the discriminative visual patterns and weaken the noise patterns.The study also uses both A-softmax loss and Focal loss to optimize the network by increasing the inter-class variability and reweighting samples respectively.MJR-Net is evaluated on three food datasets(ETH Food-101,Vireo Food-172,and ISIA Food-500).The proposed method achieves 90.82%,91.37%,and 64.95%accuracy,respectively.Experimental results show that,compared with other food recognition methods,MJR-Net shows greater competitiveness and especially achieves the state-of-the-art recognition performance on Vireo Food-172 and ISIA Food-500.Comprehensive ablation studies and visual analysis also prove the effectiveness of the proposed method.
作者 刘宇昕 闵巍庆 蒋树强 芮勇 LIU Yu-Xin;MIN Wei-Qing;JIANG Shu-Qiang;RUI Yong(Key Laboratory of Intelligent Information Processing,Chinese Academy of Sciences(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;Lenovo Group,Beijing 100085,China)
出处 《软件学报》 EI CSCD 北大核心 2022年第11期4379-4395,共17页 Journal of Software
基金 国家自然科学基金(61972378,U1936203,U19B2040)。
关键词 食品图像识别 深度学习 拼图重构 特征金字塔 注意力机制 food image recognition deep learning jigsaw and reconstruction feature pyramid attention mechanism
  • 相关文献

参考文献1

共引文献6

同被引文献27

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部