期刊文献+

图像级标记弱监督目标检测综述

Image-level labeled weakly supervised object detection:a survey
原文传递
导出
摘要 目标检测是计算机视觉领域的基本任务之一,根据标签信息的不同,可分为全监督目标检测、半监督目标检测和弱监督目标检测等。弱监督目标检测旨在仅利用图像级别的类别标记信息训练检测器,从而完成对测试图像中所有目标物体的定位和分类。因能够显著降低数据标记成本,弱监督目标检测愈发受到关注且已取得令人瞩目的进展。本文由弱监督目标检测的研究意义引入,首先介绍了弱监督目标检测的标签设置及问题定义、基于多示例学习的基础框架和面临的局部主导、实例歧义和计算消耗这3大难题,接着按核心网络架构将该领域的典型算法归纳为3大类,分别是基于优化候选框生成的算法、结合图像分割的算法和基于自训练的算法,并分别阐述各类算法的核心贡献。进一步地,本文通过实验在多种评估指标上对比了各类弱监督目标检测算法的检测效果。在VOC2007(visual object classes 2007)数据集中,平均精度均值(mean average precision,mAP)最高的方法为MIST(multiple instance self-training)算法(54.9%),正确定位率(correct localization,CorLoc)最高的方法为SLV(spatial likelihood voting)算法(71.1%)。在VOC2012数据集中,mAP最高的方法为NDI-WSOD(negative deterministic information weakly supervised object detection)算法(53.9%),CorLor最高的方法为P-MIDN(pyramidal multiple instance detection network)算法(73.3%)。在MSCOCO(Microsoft common objects in context)数据集中,在交并比(intersection over union,IoU)阈值为50%时验证集上的平均精度ValAP50最高的方法为P-MIDN(pyramidal multiple instance detection network)(27.4%)。最后探讨了弱监督目标检测未来的研究方向。本文所总结的弱监督目标检测算法框架,对后续研究人员的网络设计、模型探究和优化方向等都具有一定的参考价值。 Object detection is a fundamental problem in computer vision and image processing.From the perspective of supervision,it can be divided into fully-supervised,semi-supervised,and weakly-supervised.In recent years,object detection has played an important role in various areas and shown great application value.Precise object detection depends on the accurate region or instance-level image labeling during detector training.However,the complexity of the background and the diversity of objects in real scenes make accurate image labeling extremely time-consuming and laborious.In particular,traditional fully supervised object detection algorithms need to mark the position and category of each object in the image manually with a minimum rectangular box.Thus,the cost of acquiring a training label is increased.By contrast,weakly-supervised object detection(WSOD)algorithms only require the category labels of the whole image for training.Thus,a large number of training samples can be easily obtained by searching the category labels on some image websites.WSOD has received increasing attention and achieved encouraging progress because of its ability to reduce the labor cost of labeling remarkably.Therefore,researchers focus on WSOD algorithms based on image-level coarse labeling.These algorithms slightly depend on supervised information.Compared with other supervised object detection tasks,WSOD aims to localize and classify objects in an image by using only image-level category annotations.The present study starts with the research significance of WSOD.First,the definition,basic framework,and main challenges of WSOD are introduced:1)WSOD is performed in the training and test phases with standard detectors.The whole problem of WSOD can be understood as learning a mapping relationship from several candidate boxes contained in an image to image category markers.2)The problem setup of WSOD is consistent with that of multi-example learning in weakly supervised learning.Thus,WSOD can be treated as a learning problem by taking each candidate box and the image containing all the candidate boxes as an example and a“package”itself,respectively.For each category,if the image contains at least one target object of this category,the image is a positive packet;otherwise,it is a negative packet.Therefore,detector parameters can be learned based on candidate boxes in images.If an image is predicted to be a positive packet of a certain class,then the image contains the target of this class.Thus,the target can be identified using a rectangular candidate box.3)WSOD faces three major problems:local dominance problem,instance ambiguity problem,and conspicuous memory consumption problem.Afterward,advanced WSOD algorithms are classified into three categories according to the network architectures:optimization-candidate-box-generation-based algorithms,segmentation-based algorithms,and self-training-based algorithms.Among them,the core of the optimized-candidate-box-generation-based algorithms is the improved candidate box generator in the basic framework.The core of segmentation-based and self-training-based algorithms is the improved detector in the basic framework.The difference is that the former algorithms aim to add a segmentation branch and guide detection through segmentation,whereas the latter algorithms aim to optimize the detection network.Furthermore,the detection results of various WSOD algorithms are compared under several evaluation metrics through extensive experiments.This study selects and compares the current mainstream WSOD algorithms on PASCAL visual object class 2017(VOC2007)and VOC2012 datasets.All algorithms use the Visual Geometry Group(VGG)network 16 pretrained on the ImageNet Large-Scale Visual Recognition Challenge(ILSVRC)dataset as the backbone for feature extraction to ensure the fairness of comparison.Moreover,only the performance of the model itself is evaluated without considering the effect of fully supervised models,such as Fast R-CNN.In the mean average precision(mAP)comparison on the VOC2007 dataset,multiple instance self-training(MIST)is considered the best,with the single model obtaining 54.9%mAP.The mAP of the existing advanced WSOD algorithms is between 50%and 60%.Compared with the mAP of the online instance classifier refinement(OICR)algorithm,which is often used as the baseline method,the mAP of MIST is improved by less than 15%.This finding indicates that this field still has a large room for improvement.The comparison of mAP and correct localization(CorLoc)on the VOC2012 dataset indicates that negative deterministic information weakly supervised object detection(NDI-WSOD)achieves good performance,reaching 53.9%,which is 16%higher than the OICR performance.The best algorithm for the CorLoc is pyramidal multiple instance detection network(P-MIDN),and its performance reaches 73.3%.This value is 11.2%higher than that reached by OICR.In addition,various algorithms are adopted for comparison on Microsoft common objects in context(MS COCO)datasets.The algorithm with the highest ValAP50 is still P-MIDN,which achieves 27.4%.MIST combines optimized pseudo notation generation,regularization technique,and bounding box regression in the self-training process.Thus,it can continue to be superior to its competitors on different datasets.The research of the WSOD algorithm based on image-level labeling has made a great breakthrough because of the vigorous development of deep learning.However,WSOD still faces many challenges,and a certain gap between it and fully supervised object detection exists.Finally,some valuable future research directions in this field are discussed:1)generating a few candidate boxes with high quality,2)designing a reasonable and efficient cooperative framework for detection and segmentation,3)designing a reasonable strategy or digging out many improved positive samples through the network itself,and 4)designing lightweight network models that can be applied to mobile terminals.
作者 陈震元 王振东 宫辰 Chen Zhenyuan;Wang Zhendong;Gong Chen(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education,Nanjing 210094,China;Jiangsu Key Laboratory of Image and Video Understanding for Social Security,Nanjing 210094,China)
出处 《中国图象图形学报》 CSCD 北大核心 2023年第9期2644-2660,共17页 Journal of Image and Graphics
基金 国家自然科学基金项目(61973162) 江苏省自然科学基金项目(BZ2021013) 江苏省杰出青年科学基金项目(BK20220080) 中央高校基本科研业务费专项资金(30920032202,30921013114)。
关键词 弱监督目标检测 弱监督语义分割 候选框生成器 自训练 weakly-supervised object detection weakly-supervised semantic segmentation proposal generator self-training
  • 相关文献

参考文献7

二级参考文献17

共引文献132

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部