摘要
通过对道路场景进行语义分割可以辅助车辆感知周边环境,达到避让行人、车辆以及各类小目标物体障碍的目的,提高行驶的安全性。针对道路场景语义分割中小目标物体识别精度不高、网络参数量过大等问题,提出一种基于多尺度注意力机制的语义分割模型。利用小波变换的多尺度多频率信息分析特性,设计一种多尺度小波注意力模块,并将其嵌入到编码器结构中,通过融合不同尺度及频率的特征信息,保留更多的边缘轮廓细节。使用编码器与解码器之间的层级连接,以及改进的金字塔池化模块进行多方面特征提取,在保留上下文特征信息的同时获得更多的图像细节。通过设计多级损失函数训练网络模型,从而加快网络收敛。在剑桥驾驶标注视频数据集上的实验结果表明,该模型的平均交并比为60.21%,与DeepLabV3+和DenseASPP模型相比参数量减少近30%,在不额外增加参数量的前提下提升了模型的分割精度,且在不同场景下均具有较好的鲁棒性。
Semantic segmentation of road scenes can assist vehicles to perceive the surrounding environment,to avoid pedestrians,vehicles and all kinds of small object obstacles,and further improve the safety of driving.This study proposes a semantic segmentation network based on multi-scale attention mechanism,aiming at the problems of low recognition accuracy of small objects in semantic segmentation of road scene in deep learning,and the large number of network parameters adversely affecting the deployment.A multi-scale wavelet attention module is designed based on the characteristics of wavelet transform with multi-scale and multi frequency information analysis and embedded into the encoder structure.By fusing the characteristics of different scales and frequencies,more edge contour details are retained.The hierarchical connection between the encoder and the decoder and the improved pyramid pooling module are used for feature extraction in many aspects to obtain more image details,while retaining the context feature information.By designing the training model of multistage loss function,the network convergence is accelerated.The experimental results on the Cambridge-driving Labeled Video Database(CamVid) show that the average intersection and merge ratio of the model is 60.21%,which reduces the parameters by nearly 30% compared with DeepLabV3+ and DenseASP models.The segmentation accuracy of this model is improved without additional parameters,and the model has good robustness in different scenes.
作者
范润泽
刘宇红
张荣芬
李景玉
FAN Runze;LIU Yuhong;ZHANG Rongfen;LI Jingyu(College of Big Data and Information Engineering,Guizhou University,Guiyang 550025,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2023年第2期288-295,共8页
Computer Engineering
基金
贵州省科学技术基金(黔科合基础-ZK[2021]重点001)。
关键词
深度学习
语义分割
注意力机制
小波变换
金字塔池化
deep learning
semantic segmentation
attention mechanism
wavelet transform
pyramid pooling