Parsing of human images is a fundamental task for determining semantic parts such as the face,arms, and legs, as well as a hat or a dress. Recent deep-learning-based methods have achieved significant improvements, but...Parsing of human images is a fundamental task for determining semantic parts such as the face,arms, and legs, as well as a hat or a dress. Recent deep-learning-based methods have achieved significant improvements, but collecting training datasets with pixel-wise annotations is labor-intensive. In this paper,we propose two solutions to cope with limited datasets.Firstly, to handle various poses, we incorporate a pose estimation network into an end-to-end humanimage parsing network, in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and can feed valuable features to the human-image parsing network. Secondly, to handle complicated backgrounds,we increase the variation in image backgrounds automatically by replacing the original backgrounds of human images with others obtained from large-scale scenery image datasets. Individually, each solution is versatile and beneficial to human-image parsing, while their combination yields further improvement. We demonstrate the effectiveness of our approach through comparisons and various applications such as garment recoloring, garment texture transfer, and visualization for fashion analysis.展开更多
文摘Parsing of human images is a fundamental task for determining semantic parts such as the face,arms, and legs, as well as a hat or a dress. Recent deep-learning-based methods have achieved significant improvements, but collecting training datasets with pixel-wise annotations is labor-intensive. In this paper,we propose two solutions to cope with limited datasets.Firstly, to handle various poses, we incorporate a pose estimation network into an end-to-end humanimage parsing network, in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and can feed valuable features to the human-image parsing network. Secondly, to handle complicated backgrounds,we increase the variation in image backgrounds automatically by replacing the original backgrounds of human images with others obtained from large-scale scenery image datasets. Individually, each solution is versatile and beneficial to human-image parsing, while their combination yields further improvement. We demonstrate the effectiveness of our approach through comparisons and various applications such as garment recoloring, garment texture transfer, and visualization for fashion analysis.