摘要
具有泛化能力的视觉特征对于计算机视觉任务来说是至关重要的。基于深度神经网络的方法采用逐层叠加特征的形式获取多尺度特征图,导致计算开销显著增加。为解决这一问题,通过在标准卷积算子中部署渐进式多尺度架构,提出一种轻量和高效的尺度嵌套卷积算子(scale-in-scale,SIS)。具体来说,设计了一种变换—分离—对抗机制来优化常规的通道计算,减轻了计算成本,同时在单一卷积层内扩大了感受野。同时,引入权重共享与特征拆分交互运算,并结合特征递归和融合机制,使所提出SIS算子能够与其他卷积算子结合,例如经典的Res Net和Res2Net架构。我们将SIS算子部署到第29层、50层和101层的Res Net和Res2Net变体中,并在CIFAR、PASCAL VOC和COCO2017等公开基准数据集上评估这些修改后的模型。实验结果表明,所提出的方法在图像分类、关键点估计、语义分割和物体检测等计算机视觉任务上的性能均优于同时期最先进的方法。
Visual features with high potential for generalization are critical for computer vision applications. In addition to the computational overhead associated with layer-by-layer feature stacking to produce multi-scale feature maps, existing approaches also incur high computational costs. To address this issue, we present a compact and efficient scale-in-scale convolution operator called SIS by incorporating an efficient progressive multi-scale architecture into a standard convolution operator. More precisely, the suggested operator uses the channel transform-divide-and-conquer technique to optimize conventional channel-wise computing, thereby lowering the computational cost while simultaneously expanding the receptive fields within a single convolution layer. Moreover, the proposed SIS operator incorporates weight-sharing with split-and-interact and recur-and-fuse mechanisms for enhanced variant design. The suggested SIS series is easily pluggable into any promising convolutional backbone, such as the well-known ResNet and Res2 Net. Furthermore, we incorporated the proposed SIS operator series into 29-layer, 50-layer, and 101-layer ResNet as well as Res2 Net variants and evaluated these modified models on the widely used CIFAR, PASCAL VOC, and COCO2017 benchmark datasets, where they consistently outperformed state-of-the-art models on a variety of major vision tasks, including image classification,key point estimation, semantic segmentation, and object detection.
作者
周满
傅雪阳
刘爱萍
Man Zhou;Xueyang Fu;Aiping Liu(School of Information Science and Tecnology,University of Science and Technology of China,Hefei 230027 China)
出处
《中国科学技术大学学报》
CAS
CSCD
北大核心
2022年第4期56-65,I0003,共11页
JUSTC
基金
supported in part by the USTC Research Funds of the Double First-Class Initiative (YD2100002003,Y D2100002004)。
关键词
多尺度卷积算子
图像分类
关键点估计
语义分割
物体检测
multi-scale convolutional operator
image classification
key point estimation
semantic segmentation
object detection