摘要
为提升目标检测网络在更多遮挡场景下的适应性和检测效果,提出了一种自监督掩码图像建模方法,该方法将训练分为2个阶段:预训练阶段和微调阶段。在预训练阶段,采用局部掩码和重建的代理任务对无标签图像进行训练。在微调阶段,针对被遮挡目标尺度变化和不同大小目标的检测问题,提出了基于视觉Transformer(vision transformer,ViT)的金字塔结构。通过在CrowdHuman和CityPersons数据集上进行对比分析,自监督掩码图像建模方法在检测被遮挡目标方面优于其他方法。
As a fundamental pursuit within computer vision,object detection addresses the challenge of categorizing objects and accurately pinpointing their locations.Nevertheless,the intricacies of real-world scenarios frequently give rise to instances where objects are either partially or entirely obscured,introducing substantial complications for detection models.To bolster the versatility and detection proficiency of object detection networks when confronted with a multitude of occlusion scenarios,this paper introduces an innovative self-supervised approach to image modeling.The new approach is structured into two principal stages:pre-training and fine-tuning.During the pre-training phase,a surrogate task that entails the deliberate use of localized masking is employed,followed by the reconstruction of unlabeled images.This deliberate proxy task equips our model with valuable pre-training experiences,enabling it to acclimate to a spectrum of occlusion patterns and degrees.In the subsequent fine-tuning stage,the intrinsic challenges associated with detecting objects of varying scales and diverse sizes within occluded environments are addressed.A pyramid structure is proposed based on the Visual Transformer(ViT),a state-of-the-art architectural paradigm within computer vision.The ViT-FPN(Vision Transformer Feature Pyramid Network)substantially augments our detector’s proficiency in effectively managing a diverse range of occlusion scenarios.The method’s performance undergoes rigorous evaluation on benchmark datasets,including CrowdHuman and CityPersons.Our experimental results demonstrates the self-supervised masked image modeling approach presented in this study outperforms other methods in detecting occluded objects.
作者
冯欣
胡成杭
FENG Xin;HU Chenghang(College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)
出处
《重庆理工大学学报(自然科学)》
CAS
北大核心
2024年第6期186-193,共8页
Journal of Chongqing University of Technology:Natural Science
基金
重庆市研究生科研创新项目(CYS23678)
重庆理工大学研究生教育高质量发展项目(gzlcx20233194)。