摘要
针对复杂交通场景下行人特征信息提取不完整、检测精度不高的问题,提出一种基于YOLOv5网络改进的融合上下文信息和注意力机制的行人检测算法——YOLOv5-STRDC。将Swin Transformer置于骨干网络中,在高效获取全局信息的同时丰富上下文信息。提出融合5个并行空洞卷积和改进卷积块注意模块(Convolutional Block Attention Module,CBAM)注意力机制的空间金字塔池化(Spatial Pyramid Pooling,SPP)模块,输出较大图像范围信息的同时分别从通道和空间维度上增强了特征的融合能力。集成坐标注意力(Coordinate Attention,CA)机制,突出局部重点区域,以得到更准确的特征信息。YOLOv5-STRDC算法在公开的WiderPerson数据集和INRIA数据集上的平均精度均值(mean Average Precision,mAP)分别达到了71.60%和92.01%,相比YOLOv5模型,分别提升了1.80%和1.34%,实现了较好的行人检测效果。所提算法的检测速度分别达到了137.34、114.71帧/秒,满足了实时检测的要求。
To address the challenges of incomplete feature extraction and low detection accuracy in complex traffic scenarios,a pedestrian detection algorithm YOLOv5-STRDC based on the YOLOv5 network improved by fusing context information and attention mechanism is proposed.Firstly,the Swin Transformer is placed in the backbone to enrich contextual information while efficiently acquiring global information.Secondly,the Spatial Pyramid Pooling(SPP)module that fuses five parallel null convolutions and improved Convolutional Block Attention Module(CBAM)attention mechanism is proposed,which outputs a larger image range of information while enhancing feature fusion in terms of channel and spatial dimensions,respectively.Finally,the Coordinate Attention(CA)module is integrated to highlight important local regions to extract more accurate feature information.The YOLOv5-STRDC algorithm achieves better pedestrian detection.It achieves a mean Average Precision(mAP)of 71.60%and 92.01%on the publicly available WiderPerson dataset and INRIA dataset,respectively,which is an improvement of 1.80%and 1.34%compared to the YOLOv5 model.Meanwhile,the detection frame rate of the proposed algorithm reaches 137.34 and 114.71 frame/s respectively,which meets the requirement of real-time detection.
作者
荣幸
张志华
冯东东
袁昊
RONG Xing;ZHANG Zhihua;FENG Dongdong;YUAN Hao(School of Mathematics and Physics,Lanzhou Jiaotong University,Lanzhou 730070,China;National-Local Joint Engineering Research Center of Technologies and Applications for National Geographic State Monitoring,Lanzhou 730070,China;Gansu Provincial Engineering Laboratory for National Geographic State Monitoring,Lanzhou 730070,China;Faculty of Geomatics,Lanzhou Jiaotong University,Lanzhou 730070,China)
出处
《无线电工程》
2024年第9期2152-2161,共10页
Radio Engineering
基金
国家重点研发计划(2022YFB3903604)
甘肃省自然科学基金(23JRRA870)。
关键词
行人检测
上下文信息
空洞卷积
特征金字塔
注意力机制
pedestrian detection
contextual information
null convolution
feature pyramids
attentional mechanisms