DenseCL:A simple framework for self-supervised dense visual pre-training

导出

摘要 Self-supervised learning aims to learn a universal feature representation without labels.To date,most existing self-supervised learning methods are designed and optimized for image classification.These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction.To fill this gap,we aim to design an effective,dense self-supervised learning framework that directly works at the level of pixels(or local features)by taking into account the correspondence between local features.Specifically,we present dense contrastive learning(DenseCL),which implements self-supervised learning by optimizing a pairwise contrastive(dis)similarity loss at the pixel level between two views of input images.Compared to the supervised ImageNet pre-training and other self-supervised learning methods,our self-supervised DenseCL pretraining demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection,semantic segmentation and instance segmentation.Specifically,our approach significantly outperforms the strong MoCo-v2 by 2.0%AP on PASCAL VOC object detection,1.1%AP on COCO object detection,0.9%AP on COCO instance segmentation,3.0%mIoU on PASCAL VOC semantic segmentation and 1.8%mIoU on Cityscapes semantic segmentation.The improvements are up to 3.5%AP and 8.8%mIoU over MoCo-v2,and 6.1%AP and 6.1%mIoU over supervised counterpart with frozen-backbone evaluation protocol.

作者 Xinlong Wang Rufeng Zhang Chunhua Shen Tao Kong

机构地区 The University of Adelaide Zhejiang University Tongji University ByteDance AI Lab.

出处《Visual Informatics》 EI 2023年第1期30-40,共11页 可视信息学（英文）

关键词 Self-supervised learning Visual pre-training Dense prediction tasks

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Chuanlong Sun,Hong Zhao,Liang Mu,Fuliang Xu,Laiwei Lu.Image Semantic Segmentation for Autonomous Driving Based on Improved U-Net[J].Computer Modeling in Engineering & Sciences,2023(7):787-801.
2Jeffrey D.Bernstein,David J.Bracken,Shira R.Abeles,Ryan K.Orosco,Philip A.Weissbrod.Surgical wound classification in otolaryngology:A state-of-the-art review[J].World Journal of Otorhinolaryngology-Head and Neck Surgery,2022,8(2):139-144.
3李慧娴.5W+H思维导图模式在小学美术课堂的应用[J].中文科技期刊数据库（全文版）教育科学,2021(1):0063-0064.
4李永财,刘向阳.多视图动量对比学习算法[J].计算机应用研究,2023,40(2):354-358.
5Lanxiao Wang,Wenzhe Hu,Heqian Qiu,Chao Shang,Taijin Zhao,Benliu Qiu,King Ngi Ngan,Hongliang Li.A Survey of Vision and Language Related Multi-Modal Task[J].CAAI Artificial Intelligence Research,2022,1(2):111-136.
6Carolina Crisci,Gonzalo Perera.Asymptotic Extremal Distribution for Non-Stationary, Strongly-Dependent Data[J].Advances in Pure Mathematics,2022,12(8):479-489.
7COLL AGE[J].城市漫步（上海版、英文）,2018(12):28-29.
8Xiao-Qi Han,Sheng-Song Xu,Zhen Feng,Rong-Qiang He,Zhong-Yi Lu.Framework for Contrastive Learning Phases of Matter Based on Visual Representations[J].Chinese Physics Letters,2023,40(2):50-54.
9高高.有园四季[J].瑞丽（家居设计）,2022(11):118-123.
10ZHANG Qingsong,SUN Linjun,YANG Guowei,LU Baoli,NING Xin,LI Weijun.TBNN: totally-binary neural network for image classification[J].Optoelectronics Letters,2023,19(2):117-122.

Visual Informatics

2023年第1期

浏览历史

内容加载中请稍等...

DenseCL:A simple framework for self-supervised dense visual pre-training

相关作者

相关机构

相关主题

浏览历史