Unsupervised image-to-image translation is a challenging task for computer vision. The goal of image translation is to learn a mapping between two domains, without corresponding image pairs. Many previous works only f...Unsupervised image-to-image translation is a challenging task for computer vision. The goal of image translation is to learn a mapping between two domains, without corresponding image pairs. Many previous works only focused on image-level translation but ignored image features processing, which led to a certain semantics loss, such as the changes of the background of the generated image, partial transformation, and so on. In this work, we propose a method of image-to-image translation based on generative adversarial nets(GANs). We use autoencoder structure to extract image features in the generator and add semantic consistency loss on extracted features to maintain the semantic consistency of the generated image. Self-attention mechanism at the end of generator is used to obtain long-distance dependency in image. At the same time, as expanding the convolution receptive field, the quality of the generated image is enhanced. Quantitative experiment shows that our method significantly outperforms previous works. Especially on images with obvious foreground, our model shows an impressive improvement.展开更多
基金supported in part by the National Natural Science Foundation of China(Nos.61906135,62020106004,92048301 and 61906027)the Tianjin Science and Technology Plan Project(No.20JCQNJC01350)。
文摘Unsupervised image-to-image translation is a challenging task for computer vision. The goal of image translation is to learn a mapping between two domains, without corresponding image pairs. Many previous works only focused on image-level translation but ignored image features processing, which led to a certain semantics loss, such as the changes of the background of the generated image, partial transformation, and so on. In this work, we propose a method of image-to-image translation based on generative adversarial nets(GANs). We use autoencoder structure to extract image features in the generator and add semantic consistency loss on extracted features to maintain the semantic consistency of the generated image. Self-attention mechanism at the end of generator is used to obtain long-distance dependency in image. At the same time, as expanding the convolution receptive field, the quality of the generated image is enhanced. Quantitative experiment shows that our method significantly outperforms previous works. Especially on images with obvious foreground, our model shows an impressive improvement.