Specular highlight detection and removal is a fundamental problem in computer vision and image processing.In this paper,we present an efficient endto-end deep learning model for automatically detecting and removing sp...Specular highlight detection and removal is a fundamental problem in computer vision and image processing.In this paper,we present an efficient endto-end deep learning model for automatically detecting and removing specular highlights in a single image.In particular,an encoder–decoder network is utilized to detect specular highlights,and then a novel Unet-Transformer network performs highlight removal;we append transformer modules instead of feature maps in the Unet architecture.We also introduce a highlight detection module as a mask to guide the removal task.Thus,these two networks can be jointly trained in an effective manner.Thanks to the hierarchical and global properties of the transformer mechanism,our framework is able to establish relationships between continuous self-attention layers,making it possible to directly model the mapping between the diffuse area and the specular highlight area,and reduce indeterminacy within areas containing strong specular highlight reflection.Experiments on public benchmark and real-world images demonstrate that our approach outperforms state-of-the-art methods for both highlight detection and removal tasks.展开更多
基金This work was partially funded by the National Natural Science Foundation of China(U21A20515,62172416,62172415,U2003109)Youth Innovation Promotion Association of the Chinese Academy of Sciences(2022131).
文摘Specular highlight detection and removal is a fundamental problem in computer vision and image processing.In this paper,we present an efficient endto-end deep learning model for automatically detecting and removing specular highlights in a single image.In particular,an encoder–decoder network is utilized to detect specular highlights,and then a novel Unet-Transformer network performs highlight removal;we append transformer modules instead of feature maps in the Unet architecture.We also introduce a highlight detection module as a mask to guide the removal task.Thus,these two networks can be jointly trained in an effective manner.Thanks to the hierarchical and global properties of the transformer mechanism,our framework is able to establish relationships between continuous self-attention layers,making it possible to directly model the mapping between the diffuse area and the specular highlight area,and reduce indeterminacy within areas containing strong specular highlight reflection.Experiments on public benchmark and real-world images demonstrate that our approach outperforms state-of-the-art methods for both highlight detection and removal tasks.