摘要
基于Transformer的语义分割网络存在2个问题:分辨率变化引起的分割精度显著下降,自注意力机制计算复杂度过高。为此,利用零值填充的卷积可保留位置信息的特性,提出自适应卷积位置编码模块;利用自注意力计算中特定矩阵的维度可相互抵消的特性,提出降低自注意力计算量的联合重采样自注意力模块;设计用于融合不同阶段特征图的解码器,构造能够自适应不同分辨率输入的高效分割网络EA-Former. EA-Former在数据集ADE20K、Cityscapes上的最优平均交并比分别为51.0%、83.9%.与主流分割算法相比,EA-Former能够以更低的计算复杂度得到具有竞争力的分割精度,由输入分辨率变化引起的分割性能下降问题得以缓解.
There are two problems at semantic segmentation network based on Transformer:significant drop of the segmentation accuracy due to the resolution variation and high computational complexity of self-attention.An adaptive convolutional positional encoding module was proposed,using a property of zero-padding convolution to retain positional information.Using the property that the dimensions of specific matrices can cancel each other in the self-attention computation.A joint resampling self-attention module to reduce the computational burden was proposed.A decoder was designed to fuse feature maps from different stages,resulting in the construction of an efficient segmentation network EA-Former which was capable of adapting to different resolution inputs.The mean intersection over union of EA-Former on the ADE20K was 51.0%and on the Cityscapes was 83.9%.Compared with the mainstream segmentation methods,the proposed network could achieve competitive accuracy with lower computational complexity,and the degradation of the segmentation performance caused by the variation of the input resolution was alleviated.
作者
张海波
蔡磊
任俊平
王汝言
刘富
ZHANG Hai-bo;CAI Lei;REN Jun-ping;WANG Ru-yan;LIU Fu(School of Communications and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Chongqing Key Laboratory of Ubiquitous Sensing and Networking,Chongqing 400065,China;Chongqing Urban Lighting Center,Chongqing 400023,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2023年第6期1205-1214,共10页
Journal of Zhejiang University:Engineering Science
基金
国家自然科学基金资助项目(62271094)
长江学者和创新团队发展计划基金资助项目(IRT16R72)
重庆市留创计划创新类资助项目(cx2020059)。