摘要
在目标检测感知任务中,基于Detection Transformer(DETR)的无锚框方法由于不需要复杂的非极大值抑制后处理操作从而受到了广泛的关注。针对DETR使用的残差网络(Residual Network, Res Net)骨干在全局信息提取能力上的不足,文章提出一种基于卷积神经网络(Convolutional Neural Network,CNN)和Transformer混合融合骨干的改进型DETR目标检测方法。模型骨干网络基于Swin Transformer改进,在层级结构上并联多个ConvNext块,完成对局部信息和全局信息特征的融合提取,图像特征与可学习的目标查询进行交叉注意力生成预测框。在COCO2017测试集结果证明,改进型DETR方法能够更有效地融合特征,在平均AP上较Res Net50骨干网络AP提升1.6%,在FPS上较Res Net50骨干网络提升10.7%。
Among target detection awareness tasks, anchored frameless methods based on Detection Transformer(DETR) have attracted wide attention because they do not require complex non-maximum suppression post-processing operations. According to the insufficient global information extraction capability of the Residual Network(ResNet)backbone used by DETR, An improved DETR object detection method based on the Convolutional Neural Network(CNN) and Transformer is proposed in this paper. Based on the improvement of Swin Transformer, the model backbone network connects multiple ConvNext blocks in hierarchical structure to complete the fusion extraction of local information and global information features, and the cross-attention of image features and learning target queries to generate prediction box. The results of the COCO2017 test set prove that the improved DETR method can fuse features more effectively, with an average AP increase of 3. 8% and FPS increase of 10. 7% compared to the ResNet50backbone network.
作者
金祖亮
Jin Zuliang(Chongqing Jiaotong University,Chongqing 400074,China)
出处
《无线互联科技》
2022年第23期109-112,共4页
Wireless Internet Technology