摘要
采用深度学习技术处理语义分割中像素损失问题的难点在于上采样过程不能完全还原像素,且现有多数网络模型为追求过高的预测精度导致结构复杂,预测效率降低,难以满足实时需求。针对以上问题,提出一种实时多层融合注意力机制的语义分割网络模型。该模型从两个方面精确还原像素,在保证较高处理精度的情况下达到更好的实时性。首先在U型网络横向连接中采用注意力机制与深浅层特征融合的方式构建高效的横向连接模块,其优势在于通过注意力机制关注更全面的上下文特征,以便为后续上采样过程提供尽可能多的像素信息;然后考虑到深层网络产生的特征点定位性更强,模型在上采样过程中采用深层特征进行像素定位修正,使得像素还原更加精确,并且可以更好地解决预测结果边缘平滑问题;最后采用轻量级模型作为基础模型,在融合过程中多处采用1×1卷积进行降维。实验结果表明,在VOC和Cityscapes数据集输入图像大小为512×1024时,该模型可保持64帧/s的处理速度,平均交并比最高可达75.3%。
The difficulty of dealing with pixel loss in semantic segmentation through deep learning is that the upsampling process cannot fully restore pixels,and most existing network models pursue high prediction accuracy,resulting in complex structures and reduced prediction effi⁃ciency,making it difficult to meet real-time requirements.A real-time multi-layer fusion attention mechanism semantic segmentation network model is proposed to address the above issues.This model accurately restores pixels from two aspects,achieving better real-time performance while ensuring high processing accuracy.Firstly,in the U-shaped network horizontal connection,an efficient horizontal connection module is constructed by combining attention mechanism with deep and shallow features.Its advantage lies in the attention mechanism focusing on more comprehensive contextual features to provide as much pixel information as possible for subsequent upsampling processes;Then,considering that the feature points generated by the deep network have stronger localization ability,the model uses deep features for pixel localization cor⁃rection during the upsampling process,making pixel restoration more accurate and better solving the problem of edge smoothing in prediction results.Adopting a lightweight model as the basic model,using 1×1 convolution for dimensionality reduction in multiple locations during the fusion process.The experimental results show that when the input image size of the VOC and Cityscapes datasets is 512×1024,the model can maintain a processing speed of 64 frames/s,with an average intersection to parallel ratio of up to 75.3%.
作者
程庆贺
张振寰
胡燕
钟珞
CHENG Qinghe;ZHANG Zhenhuan;HU Yan;ZHONG Luo(School of Information Technology,Meiga Polytechnic Institute of Hubei,Xiaogan 432017,China;School of Computer&Artificial Intelligence,Wuhan University of Technology,Wuhan 430070,China)
出处
《软件导刊》
2023年第8期48-53,共6页
Software Guide
基金
湖北省自然科学基金项目(2021CFB513)
湖北省重点研发项目(2021BAA030)。
关键词
计算机视觉
多层融合
像素还原
边缘平滑
computer vision
multi-layer fusion
pixel restoration
edge smoothing