The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual ex...The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.展开更多
Patch-level features are essential for achieving good performance in computer vision tasks. Besides well- known pre-defined patch-level descriptors such as scalein- variant feature transform (SIFT) and histogram of ...Patch-level features are essential for achieving good performance in computer vision tasks. Besides well- known pre-defined patch-level descriptors such as scalein- variant feature transform (SIFT) and histogram of oriented gradient (HOG), the kernel descriptor (KD) method [1] of- fers a new way to "grow-up" features from a match-kernel defined over image patch pairs using kernel principal compo- nent analysis (KPCA) and yields impressive results. In this paper, we present efficient kernel descriptor (EKD) and efficient hierarchical kernel descriptor (EHKD), which are built upon incomplete Cholesky decomposition. EKD au- tomatically selects a small number of pivot features for gener- ating patch-level features to achieve better computational effi- ciency. EHKD recursively applies EKD to form image-level features layer-by-layer. Perhaps due to parsimony, we find surprisingly that the EKD and EHKD approaches achieved competitive results on several public datasets compared with other state-of-the-art methods, at an improved efficiency over KD.展开更多
基金supported by National Natural Science Foundation of China(Nos.61922046 and 62276145)the National Key Research and Development Program of China(No.2018AAA0100400)Fundamental Research Funds for Central Universities,China(No.63223049).
文摘The SDR-to-HDR translation technique can convert the abundant standard-dynamic-range (SDR) media resources to high-dynamic-range (HDR) ones, which can represent high-contrast scenes, providing more realistic visual experiences. While recent vision Transformers have achieved promising performance in many low-level vision tasks, there are few works attempting to leverage Transformers for SDR-to-HDR translation. In this paper, we are among the first to investigate the performance of Transformers for SDR-to-HDR translation. We find that directly using the self-attention mechanism may involve artifacts in the results due to the inappropriate way to model long-range dependencies between the low-frequency and high-frequency components. Taking this into account, we advance the self-attention mechanism and present a dual frequency attention (DFA), which leverages the self-attention mechanism to separately encode the low-frequency structural information and high-frequency detail information. Based on the proposed DFA, we further design a multi-scale feature fusion network, named dual frequency Transformer (DFT), for efficient SDR-to-HDR translation. Extensive experiments on the HDRTV1K dataset demonstrate that our DFT can achieve better quantitative and qualitative performance than the recent state-of-the-art methods. The code of our DFT is made publicly available at https://github.com/CS-GangXu/DFT.
文摘Patch-level features are essential for achieving good performance in computer vision tasks. Besides well- known pre-defined patch-level descriptors such as scalein- variant feature transform (SIFT) and histogram of oriented gradient (HOG), the kernel descriptor (KD) method [1] of- fers a new way to "grow-up" features from a match-kernel defined over image patch pairs using kernel principal compo- nent analysis (KPCA) and yields impressive results. In this paper, we present efficient kernel descriptor (EKD) and efficient hierarchical kernel descriptor (EHKD), which are built upon incomplete Cholesky decomposition. EKD au- tomatically selects a small number of pivot features for gener- ating patch-level features to achieve better computational effi- ciency. EHKD recursively applies EKD to form image-level features layer-by-layer. Perhaps due to parsimony, we find surprisingly that the EKD and EHKD approaches achieved competitive results on several public datasets compared with other state-of-the-art methods, at an improved efficiency over KD.