摘要
多标签图像分类问题是计算机视觉领域的重要问题之一,它需要对图像中的所有标签进行预测。而一幅图像中待分类的标签个数往往不止一个,同时图像中对象的大小、位置和姿态的变化都会对模型的分类性能产生影响。因此,如何有效地提高图像特征的准确表达能力是一个亟需解决的难题。针对上述难题,文中提出了一个新颖的双流重构网络来对图像进行特征抽取。具体而言,该模型首先应用一个双流注意力网络来对图像进行基于通道信息和空间信息的特征提取,并经过特征拼接使得图像特征同时兼顾通道特征细节信息和空间特征细节信息。其次,该模型引入了重构损失函数,对双流网络进行特征约束,迫使上述两种分歧特征具有相同的特征表达能力,以此促使提取的双流特征共同向真值特征迫近。在基于VOC 2007和MS COCO多标签图像数据集上的实验结果表明,所提出的双流重构网络能够准确有效地提取出显著特征,并产生更好的分类精度。同时,鉴于重建损失对模型的解拟合作用,将该方法应用在小样本场景上,实验结果显示,所提模型对小样本数据同样具有较好的分类精度。
The multi-label image classification problem is one of the most important problems in the field of computer vision,which needs to predict and output all the labels in an image.However,the number of labels to be classified in an image is often more than one,and the changeable size,posture,and position of objects in the image will increase the difficulty of classification.Therefore,how to effectively improve the accurate expression ability of image features is an urgent problem to be solved.In response to the above-mentioned problem,a novel dual-stream reconstruction network is proposed to extract features from images.Specifically,the model first proposes a dual-stream attention network to extract features based on channel information and spatial information,and uses feature stitching to make image features have both channel detail information and spatial detail information.Secondly,a reconstruction loss function is introduced to constrain the features of the dual-stream network,forcing the above two divergent features to have the same feature expression ability,thereby promoting the extracted dual-stream features to approach the ground-truth features.Experimental results on multi-label image datasets based on VOC 2007and MS COCO show that the proposed dual-stream reconstruction network can accurately and effectively extract salient features and produce better classification accuracy.At the same time,in view of the sparse effect of reconstruction loss on model features,the proposed method is also applied to few-shot learning.The experimental results show that the proposed model also has good classification accuracy for fewshot learning.
作者
方仲礼
王喆
迟子秋
FANG Zhong-li;WANG Zhe;CHI Zi-qiu(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处
《计算机科学》
CSCD
北大核心
2022年第1期212-218,共7页
Computer Science
基金
上海市科技计划项目(20511100600)
国家自然科学基金(62076094)。
关键词
多标签图像识别
特征重构
深度学习
小样本学习
图像注意力机制
Multi-label image recognition
Feature reconstruction
Deep learning
Few-shot learning
Image attention mechanism