期刊文献+

基于注意力机制与多尺度池化的实时语义分割网络 被引量:1

Real-Time Semantic Segmentation Network Based on Attention Mechanism and Multi-Scale Pooling
下载PDF
导出
摘要 现有语义分割算法在精确度方面表现良好,但在速度上难以满足实时性要求。为提升网络分割速度同时确保高精确度,提出一种新型实时语义分割网络。设计融合通道注意力模块,先通过最大池化和平均池化捕捉全局特征,对池化后的特征图进行级联、卷积和变形以得到各通道权重,再将原特征图与各通道权重进行矩阵乘法操作,得到融合通道权重。将融合通道权重与原特征图进行元素级乘法操作,保证各通道权重与原特征图有效融合。提出一种轻量化金字塔场景解析模块,使用多尺度池化操作充分捕捉多尺度目标特征,在原金字塔场景解析模块的基础上减少池化后的特征图通道数,从而降低计算量。池化后特征图以级联方式连接,利用输入特征图引导连接后的特征图,以有效融合高层和低层特征图。在公共图像数据集Cityscapes上进行实验,结果表明,该网络在验证集、测试集上的准确率分别达到74.6%、73.8%,分割速度达到60.6帧/s,分割性能优于ICNet、DFANet-A等网络。 Existing semantic segmentation algorithms achieve high accuracy but their performance in real-time scenarios is insufficient owing to their low speed.Therefore,a new real-time semantic segmentation network is proposed to improve speed and ensure accuracy in network segmentation.First,Fusion Channel Attention Module(FCAM)is designed,largest and average pooling are applied to capture features.Through the cascade,convolution,and reshape operations,the weights of each channel is obtained.Subsequently,matrix multiplication of the original feature map and weights of each channel is performed to obtain the fused channel weights.Finally,element-level multiplication is performed between the fused channel weight and original feature map to ensure that the weight of each channel is effectively integrated with the original feature map.Additionally,a lightweight pyramid scene parsing module is designed based on the original pyramid scene parsing module.This uses a multi-scale pooling operation to fully capture the multi-scale characteristics of a target,which reduces the number of channels of the feature map in a cascaded manner and the amount of computation.Feature map after pooling connected in cascade way,an input feature figure is utilized to lead the connected feature map to learn integrating the high-and low-level feature maps effectively.Experiments conducted on the Cityscapes public image dataset show that the network achieves an accuracy of 74.6%and 73.8%on the validation and test sets,respectively,with a segmentation speed of 60.6 frame/s.Moreover,the segmentation performance is better than that of networks such as ICNet and DFANet-A.
作者 王卓 瞿绍军 WANG Zhuo;QU Shaojun(College of Information Science and Engineering,Hunan Normal University,Changsha 410081,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第10期222-229,238,共9页 Computer Engineering
基金 国家自然科学基金(12071126)。
关键词 语义分割 全局特征 注意力机制 金字塔场景解析 多尺度池化 semantic segmentation global feature attention mechanism pyramid scene parsing multi-scale pooling
  • 相关文献

参考文献6

二级参考文献35

共引文献244

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部