图像级标记弱监督目标检测综述

Image-level labeled weakly supervised object detection:a survey

导出

摘要目标检测是计算机视觉领域的基本任务之一,根据标签信息的不同,可分为全监督目标检测、半监督目标检测和弱监督目标检测等。弱监督目标检测旨在仅利用图像级别的类别标记信息训练检测器,从而完成对测试图像中所有目标物体的定位和分类。因能够显著降低数据标记成本,弱监督目标检测愈发受到关注且已取得令人瞩目的进展。本文由弱监督目标检测的研究意义引入,首先介绍了弱监督目标检测的标签设置及问题定义、基于多示例学习的基础框架和面临的局部主导、实例歧义和计算消耗这3大难题,接着按核心网络架构将该领域的典型算法归纳为3大类,分别是基于优化候选框生成的算法、结合图像分割的算法和基于自训练的算法,并分别阐述各类算法的核心贡献。进一步地,本文通过实验在多种评估指标上对比了各类弱监督目标检测算法的检测效果。在VOC2007(visual object classes 2007)数据集中,平均精度均值(mean average precision,mAP)最高的方法为MIST(multiple instance self-training)算法(54.9%),正确定位率(correct localization,CorLoc)最高的方法为SLV(spatial likelihood voting)算法(71.1%)。在VOC2012数据集中,mAP最高的方法为NDI-WSOD(negative deterministic information weakly supervised object detection)算法(53.9%),CorLor最高的方法为P-MIDN(pyramidal multiple instance detection network)算法(73.3%)。在MSCOCO(Microsoft common objects in context)数据集中,在交并比(intersection over union,IoU)阈值为50%时验证集上的平均精度ValAP50最高的方法为P-MIDN(pyramidal multiple instance detection network)(27.4%)。最后探讨了弱监督目标检测未来的研究方向。本文所总结的弱监督目标检测算法框架,对后续研究人员的网络设计、模型探究和优化方向等都具有一定的参考价值。 Object detection is a fundamental problem in computer vision and image processing.From the perspective of supervision,it can be divided into fully-supervised,semi-supervised,and weakly-supervised.In recent years,object detection has played an important role in various areas and shown great application value.Precise object detection depends on the accurate region or instance-level image labeling during detector training.However,the complexity of the background and the diversity of objects in real scenes make accurate image labeling extremely time-consuming and laborious.In particular,traditional fully supervised object detection algorithms need to mark the position and category of each object in the image manually with a minimum rectangular box.Thus,the cost of acquiring a training label is increased.By contrast,weakly-supervised object detection(WSOD)algorithms only require the category labels of the whole image for training.Thus,a large number of training samples can be easily obtained by searching the category labels on some image websites.WSOD has received increasing attention and achieved encouraging progress because of its ability to reduce the labor cost of labeling remarkably.Therefore,researchers focus on WSOD algorithms based on image-level coarse labeling.These algorithms slightly depend on supervised information.Compared with other supervised object detection tasks,WSOD aims to localize and classify objects in an image by using only image-level category annotations.The present study starts with the research significance of WSOD.First,the definition,basic framework,and main challenges of WSOD are introduced:1)WSOD is performed in the training and test phases with standard detectors.The whole problem of WSOD can be understood as learning a mapping relationship from several candidate boxes contained in an image to image category markers.2)The problem setup of WSOD is consistent with that of multi-example learning in weakly supervised learning.Thus,WSOD can be treated as a learning problem by taking each candidate box and the image containing all the candidate boxes as an example and a“package”itself,respectively.For each category,if the image contains at least one target object of this category,the image is a positive packet;otherwise,it is a negative packet.Therefore,detector parameters can be learned based on candidate boxes in images.If an image is predicted to be a positive packet of a certain class,then the image contains the target of this class.Thus,the target can be identified using a rectangular candidate box.3)WSOD faces three major problems:local dominance problem,instance ambiguity problem,and conspicuous memory consumption problem.Afterward,advanced WSOD algorithms are classified into three categories according to the network architectures:optimization-candidate-box-generation-based algorithms,segmentation-based algorithms,and self-training-based algorithms.Among them,the core of the optimized-candidate-box-generation-based algorithms is the improved candidate box generator in the basic framework.The core of segmentation-based and self-training-based algorithms is the improved detector in the basic framework.The difference is that the former algorithms aim to add a segmentation branch and guide detection through segmentation,whereas the latter algorithms aim to optimize the detection network.Furthermore,the detection results of various WSOD algorithms are compared under several evaluation metrics through extensive experiments.This study selects and compares the current mainstream WSOD algorithms on PASCAL visual object class 2017(VOC2007)and VOC2012 datasets.All algorithms use the Visual Geometry Group(VGG)network 16 pretrained on the ImageNet Large-Scale Visual Recognition Challenge(ILSVRC)dataset as the backbone for feature extraction to ensure the fairness of comparison.Moreover,only the performance of the model itself is evaluated without considering the effect of fully supervised models,such as Fast R-CNN.In the mean average precision(mAP)comparison on the VOC2007 dataset,multiple instance self-training(MIST)is considered the best,with the single model obtaining 54.9%mAP.The mAP of the existing advanced WSOD algorithms is between 50%and 60%.Compared with the mAP of the online instance classifier refinement(OICR)algorithm,which is often used as the baseline method,the mAP of MIST is improved by less than 15%.This finding indicates that this field still has a large room for improvement.The comparison of mAP and correct localization(CorLoc)on the VOC2012 dataset indicates that negative deterministic information weakly supervised object detection(NDI-WSOD)achieves good performance,reaching 53.9%,which is 16%higher than the OICR performance.The best algorithm for the CorLoc is pyramidal multiple instance detection network(P-MIDN),and its performance reaches 73.3%.This value is 11.2%higher than that reached by OICR.In addition,various algorithms are adopted for comparison on Microsoft common objects in context(MS COCO)datasets.The algorithm with the highest ValAP50 is still P-MIDN,which achieves 27.4%.MIST combines optimized pseudo notation generation,regularization technique,and bounding box regression in the self-training process.Thus,it can continue to be superior to its competitors on different datasets.The research of the WSOD algorithm based on image-level labeling has made a great breakthrough because of the vigorous development of deep learning.However,WSOD still faces many challenges,and a certain gap between it and fully supervised object detection exists.Finally,some valuable future research directions in this field are discussed:1)generating a few candidate boxes with high quality,2)designing a reasonable and efficient cooperative framework for detection and segmentation,3)designing a reasonable strategy or digging out many improved positive samples through the network itself,and 4)designing lightweight network models that can be applied to mobile terminals.

作者陈震元王振东宫辰 Chen Zhenyuan;Wang Zhendong;Gong Chen(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education,Nanjing 210094,China;Jiangsu Key Laboratory of Image and Video Understanding for Social Security,Nanjing 210094,China)

机构地区南京理工大学计算机科学与工程学院高维信息智能感知与系统教育部重点实验室江苏省社会安全图像与视频理解重点实验室

出处《中国图象图形学报》 CSCD 北大核心 2023年第9期2644-2660,共17页 Journal of Image and Graphics

基金国家自然科学基金项目(61973162) 江苏省自然科学基金项目(BZ2021013) 江苏省杰出青年科学基金项目(BK20220080) 中央高校基本科研业务费专项资金(30920032202,30921013114)。

关键词弱监督目标检测弱监督语义分割候选框生成器自训练 weakly-supervised object detection weakly-supervised semantic segmentation proposal generator self-training

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献7

1曹家乐,李亚利,孙汉卿,谢今,黄凯奇,庞彦伟.基于深度学习的视觉目标检测技术综述[J].中国图象图形学报,2022,27(6):1697-1722. 被引量：72
2任冬伟,王旗龙,魏云超,孟德宇,左旺孟.视觉弱监督学习研究进展[J].中国图象图形学报,2022,27(6):1768-1798. 被引量：9
3徐歆恺,马岩,钱旭,张龑.自动驾驶场景的尺度感知实时行人检测[J].中国图象图形学报,2021,26(1):93-100. 被引量：10
4杨辉,权冀川,梁新宇,王中伟.基于弱监督学习的目标检测研究进展[J].计算机工程与应用,2021,57(16):40-49. 被引量：8
5赵文清,孔子旭,周震东,赵振兵.增强小目标特征的航空遥感目标检测[J].中国图象图形学报,2021,26(3):644-653. 被引量：20
6周明非,汪西莉.弱监督深层神经网络遥感图像目标检测模型[J].中国科学：信息科学,2018,48(8):1022-1034. 被引量：10
7周小龙,陈小佳,陈胜勇,雷帮军.弱监督学习下的目标检测算法综述[J].计算机科学,2019,46(11):49-57. 被引量：11

二级参考文献17

1周志华.Multi-Instance Learning from Supervised View[J].Journal of Computer Science & Technology,2006,21(5):800-809. 被引量：12
2李宇,刘雪莹,张洪群,李湘眷,孙晓瑶.基于卷积神经网络的光学遥感图像检索[J].光学精密工程,2018,26(1):200-207. 被引量：39
3zhi-hua zhou.A brief introduction to weakly supervised learning[J].National Science Review,2018,5(1):44-53. 被引量：106
4吴天舒,张志佳,刘云鹏,裴文慧,陈红叶.基于改进SSD的轻量化小目标检测算法[J].红外与激光工程,2018,47(7):37-43. 被引量：56
5姚群力,胡显,雷宏.深度卷积神经网络在目标检测中的研究进展[J].计算机工程与应用,2018,54(17):1-9. 被引量：58
6周明非,汪西莉.弱监督深层神经网络遥感图像目标检测模型[J].中国科学：信息科学,2018,48(8):1022-1034. 被引量：10
7梁华,宋玉龙,钱锋,宋策.基于深度学习的航空对地小目标检测[J].液晶与显示,2018,33(9):793-800. 被引量：28
8刘峰,沈同圣,马新星,张健.基于多波段深度神经网络的舰船目标识别[J].光学精密工程,2017,25(11):2939-2946. 被引量：26
9龙敏,佟越洋.应用卷积神经网络的人脸活体检测算法研究[J].计算机科学与探索,2018,12(10):1658-1670. 被引量：17
10张文,谭晓阳.基于Attention的弱监督多标号图像分类[J].数据采集与处理,2018,33(5):801-808. 被引量：2

共引文献132

1吴佳璐,田秋红,岳金鸿.基于残差双注意力与跨级特征融合模块的静态手势识别[J].计算机系统应用,2022,31(11):111-119. 被引量：1
2吕心艳,钱奇峰,王登科,周冠博,徐雅静.基于深度图像目标检测的智能台风涡旋识别技术[J].热带气象学报,2022,38(4):492-501. 被引量：3
3周小龙,陈小佳,陈胜勇,雷帮军.弱监督学习下的目标检测算法综述[J].计算机科学,2019,46(11):49-57. 被引量：11
4谢小军,苏涛.基于矢量量化压缩密集连接层的图像压缩研究[J].信息技术,2020,44(4):97-101. 被引量：4
5李文斌,何冉.基于深度神经网络的遥感图像飞机目标检测[J].计算机工程,2020,46(7):268-276. 被引量：13
6张小娟,汪西莉.完全残差连接与多尺度特征融合遥感图像分割[J].遥感学报,2020,24(9):1120-1133. 被引量：17
7史彩娟,张卫明,陈厚儒,葛录录.基于深度学习的显著性目标检测综述[J].计算机科学与探索,2021,15(2):219-232. 被引量：19
8周培诚,程塨,姚西文,韩军伟.高分辨率遥感影像解译中的机器学习范式[J].遥感学报,2021,25(1):182-197. 被引量：55
9刘祥.基于卷积神经网络的遥感图像目标检测[J].微型电脑应用,2021,37(7):127-130. 被引量：4
10翟肇裕,曹益飞,徐焕良,袁培森,王浩云.农作物病虫害识别关键技术研究综述[J].农业机械学报,2021,52(7):1-18. 被引量：108

1高腾,张先武,李柏.深度学习在安全帽佩戴检测中的应用研究综述[J].计算机工程与应用,2023,59(6):13-29. 被引量：13
2杨梅,柯文静,王丹东.多示例学习的可行域定位及快速因果实例选择[J].山东大学学报（理学版）,2023,58(9):105-113.
3高梦婷,汤梓菲,张璐,陈尧,李建华.三级医院评审数据平台构建及应用[J].中国卫生质量管理,2023,30(9):35-37. 被引量：3
4卢超,费禹铖,刘浩,许逍.烟支激光打孔异常检测的技术研究[J].信息系统工程,2023(10):31-34.
5张睿,高美蓉,傅留虎,张鹏云,白晓露,赵娜.基于多域多尺度深度特征自适应融合的焊缝缺陷检测研究[J].振动与冲击,2023,42(17):294-305. 被引量：3
6薛刚,刘世峰,宫大庆,张培,刘忠良.利用多源数据识别城市轨道交通个体异常乘车行为[J].数据分析与知识发现,2023,7(7):46-57.
7陈辉,王怡博,黄和平,延菲,黄云峰.基于流形聚类的非球类目标多站三维点云配准方法[J].激光与光电子学进展,2023,60(16):282-289.
8徐睿,彭长根,许德权.一种类自同步ZUC算法的认证加密方案[J].计算机科学,2023,50(10):377-382. 被引量：1
9范加利,黄葵,朱兴动,孟杨凯.基于禁忌算法的舰载机甲板作业动态调度优化算法[J].系统工程与电子技术,2023,45(10):3172-3182. 被引量：2
10刘卫国,项志宇,刘伟平,齐道新,王子旭.基于分布式强化学习的车辆控制算法研究[J].汽车工程,2023,45(9):1637-1645. 被引量：1

中国图象图形学报

2023年第9期

浏览历史

内容加载中请稍等...

图像级标记弱监督目标检测综述

参考文献7

二级参考文献17

共引文献132

相关作者

相关机构

相关主题

浏览历史