In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in acc...In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level,so as to help vehicles to perceive and obtain the surrounding road environment information,which would improve driving safety.Deeplabv3+is the current popular semantic segmentation model.There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks,which leads to rough segmentation boundary and reduces semantic accuracy.This study focuses on the issue,based on the Deeplabv3+network structure and combined with the attention mechanism,to increase the weight of the segmentation area,and then proposes an improved Deeplabv3+fusion attention mechanism for road scene semantic segmentation method.First,a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+encoding end to capture more spatial context information and high-level semantic information.Then,an attention mechanism is introduced to restore the spatial detail information,and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end.The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets.The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88%and 2.58%,respectively,which is better than using Deeplabv3+.This method does not significantly increase the amount of network calculation and complexity,and has a good balance of speed and accuracy.展开更多
Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as ...Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as the backbone network and designs an encoder-decoder architecture based on multidimensional attention(MDA)module and multiscale upsampling(MSU)module.The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position,and adaptively captures the image features.The MSU module adopts parallel branches to capture the multiscale features of the images,and multiscale feature aggregation can enhance contextual information.A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.展开更多
基金National Natural Science Foundation of China(Nos.61941109,62061023)Distinguished Young Scholars of Gansu Province of China(No.21JR7RA345)。
文摘In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level,so as to help vehicles to perceive and obtain the surrounding road environment information,which would improve driving safety.Deeplabv3+is the current popular semantic segmentation model.There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks,which leads to rough segmentation boundary and reduces semantic accuracy.This study focuses on the issue,based on the Deeplabv3+network structure and combined with the attention mechanism,to increase the weight of the segmentation area,and then proposes an improved Deeplabv3+fusion attention mechanism for road scene semantic segmentation method.First,a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+encoding end to capture more spatial context information and high-level semantic information.Then,an attention mechanism is introduced to restore the spatial detail information,and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end.The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets.The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88%and 2.58%,respectively,which is better than using Deeplabv3+.This method does not significantly increase the amount of network calculation and complexity,and has a good balance of speed and accuracy.
基金Fundamental Research Fund in Heilongjiang Provincial Universities(Nos.135409602,135409102)。
文摘Semantic segmentation is for pixel-level classification tasks,and contextual information has an important impact on the performance of segmentation.In order to capture richer contextual information,we adopt ResNet as the backbone network and designs an encoder-decoder architecture based on multidimensional attention(MDA)module and multiscale upsampling(MSU)module.The MDA module calculates the attention matrices of the three dimensions to capture the dependency of each position,and adaptively captures the image features.The MSU module adopts parallel branches to capture the multiscale features of the images,and multiscale feature aggregation can enhance contextual information.A series of experiments demonstrate the validity of the model on Cityscapes and Camvid datasets.