Video harmonization is an important step in video editing to achieve visual consistency by adjusting foreground appear-ances in both spatial and temporal dimensions.Previous methods always only harmonize on a single s...Video harmonization is an important step in video editing to achieve visual consistency by adjusting foreground appear-ances in both spatial and temporal dimensions.Previous methods always only harmonize on a single scale or ignore the inaccuracy of flow estimation,which leads to limited harmonization performance.In this work,we propose a novel architecture for video harmoniza-tion by making full use of spatiotemporal features and yield temporally consistent harmonized results.We introduce multiscale harmon-ization by using nonlocal similarity on each scale to make the foreground more consistent with the background.We also propose a fore-ground temporal aggregator to dynamically aggregate neighboring frames at the feature level to alleviate the effect of inaccurate estim-ated flow and ensure temporal consistency.The experimental results demonstrate the superiority of our method over other state-of-the-art methods in both quantitative and visual comparisons.展开更多
基金This work was supported by National Natural Science Foundation of China(No.62001432)the Fundamental Research Funds for the Central Universities,China(Nos.CUC18LG024 and CUC22JG001).
文摘Video harmonization is an important step in video editing to achieve visual consistency by adjusting foreground appear-ances in both spatial and temporal dimensions.Previous methods always only harmonize on a single scale or ignore the inaccuracy of flow estimation,which leads to limited harmonization performance.In this work,we propose a novel architecture for video harmoniza-tion by making full use of spatiotemporal features and yield temporally consistent harmonized results.We introduce multiscale harmon-ization by using nonlocal similarity on each scale to make the foreground more consistent with the background.We also propose a fore-ground temporal aggregator to dynamically aggregate neighboring frames at the feature level to alleviate the effect of inaccurate estim-ated flow and ensure temporal consistency.The experimental results demonstrate the superiority of our method over other state-of-the-art methods in both quantitative and visual comparisons.