Recently, semantic segmentation has been widely applied toimage processing, scene understanding, and many others. Especially, indeep learning-based semantic segmentation, the U-Net with convolutionalencoder-decoder ar...Recently, semantic segmentation has been widely applied toimage processing, scene understanding, and many others. Especially, indeep learning-based semantic segmentation, the U-Net with convolutionalencoder-decoder architecture is a representative model which is proposed forimage segmentation in the biomedical field. It used max pooling operationfor reducing the size of image and making noise robust. However, instead ofreducing the complexity of the model, max pooling has the disadvantageof omitting some information about the image in reducing it. So, thispaper used two diagonal elements of down-sampling operation instead ofit. We think that the down-sampling feature maps have more informationintrinsically than max pooling feature maps because of keeping the Nyquisttheorem and extracting the latent information from them. In addition,this paper used two other diagonal elements for the skip connection. Indecoding, we used Subpixel Convolution rather than transposed convolutionto efficiently decode the encoded feature maps. Including all the ideas, thispaper proposed the new encoder-decoder model called Down-Sampling andSubpixel Convolution U-Net (DSSC-UNet). To prove the better performanceof the proposed model, this paper measured the performance of the UNetand DSSC-UNet on the Cityscapes. As a result, DSSC-UNet achieved89.6% Mean Intersection OverUnion (Mean-IoU) andU-Net achieved 85.6%Mean-IoU, confirming that DSSC-UNet achieved better performance.展开更多
文摘Recently, semantic segmentation has been widely applied toimage processing, scene understanding, and many others. Especially, indeep learning-based semantic segmentation, the U-Net with convolutionalencoder-decoder architecture is a representative model which is proposed forimage segmentation in the biomedical field. It used max pooling operationfor reducing the size of image and making noise robust. However, instead ofreducing the complexity of the model, max pooling has the disadvantageof omitting some information about the image in reducing it. So, thispaper used two diagonal elements of down-sampling operation instead ofit. We think that the down-sampling feature maps have more informationintrinsically than max pooling feature maps because of keeping the Nyquisttheorem and extracting the latent information from them. In addition,this paper used two other diagonal elements for the skip connection. Indecoding, we used Subpixel Convolution rather than transposed convolutionto efficiently decode the encoded feature maps. Including all the ideas, thispaper proposed the new encoder-decoder model called Down-Sampling andSubpixel Convolution U-Net (DSSC-UNet). To prove the better performanceof the proposed model, this paper measured the performance of the UNetand DSSC-UNet on the Cityscapes. As a result, DSSC-UNet achieved89.6% Mean Intersection OverUnion (Mean-IoU) andU-Net achieved 85.6%Mean-IoU, confirming that DSSC-UNet achieved better performance.