Sparse view 3D reconstruction has attracted increasing attention with the development of neural implicit 3D representation.Existing methods usually only make use of 2D views,requiring a dense set of input views for ac...Sparse view 3D reconstruction has attracted increasing attention with the development of neural implicit 3D representation.Existing methods usually only make use of 2D views,requiring a dense set of input views for accurate 3D reconstruction.In this paper,we show that accurate 3D reconstruction can be achieved by incorporating geometric priors into neural implicit 3D reconstruction.Our method adopts the signed distance function as the 3D representation,and learns a generalizable 3D surface reconstruction model from sparse views.Specifically,we build a more effective and sparse feature volume from the input views by using corresponding depth maps,which can be provided by depth sensors or directly predicted from the input views.We recover better geometric details by imposing both depth and surface normal constraints in addition to the color loss when training the neural implicit 3D representation.Experiments demonstrate that our method both outperforms state-of-the-art approaches,and achieves good generalizability.展开更多
The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representat...The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representation-based median filter as the in-loop filter in depth video codec. It constructs depth candidate set which contains relevant neighboring depth pixel based on depth and intensity similarity weighted sparse coding, then the median operation is performed on this set to select a neighboring depth pixel as the result of the filtering. The experimental results indicate that the depth bitrate is reduced by about 9% compared with anchor method. It is confirmed that the proposed method is more effective in reducing the required depth bitrates for a given synthesis quality level.展开更多
Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,w...Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,while geometry-based methods have difficulties in synthesizing detailed textures.In this paper,we propose STATE,an end-to-end deep neural network,for sparse view synthesis by learning structure and texture representations.Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions,and texture is encoded as a deformed feature map to preserve detailed textures.We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation,in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps.By decoding the aggregated features,STATE is able to generate realistic images with reasonable structures and detailed textures.Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods.Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture.Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.61902210).
文摘Sparse view 3D reconstruction has attracted increasing attention with the development of neural implicit 3D representation.Existing methods usually only make use of 2D views,requiring a dense set of input views for accurate 3D reconstruction.In this paper,we show that accurate 3D reconstruction can be achieved by incorporating geometric priors into neural implicit 3D reconstruction.Our method adopts the signed distance function as the 3D representation,and learns a generalizable 3D surface reconstruction model from sparse views.Specifically,we build a more effective and sparse feature volume from the input views by using corresponding depth maps,which can be provided by depth sensors or directly predicted from the input views.We recover better geometric details by imposing both depth and surface normal constraints in addition to the color loss when training the neural implicit 3D representation.Experiments demonstrate that our method both outperforms state-of-the-art approaches,and achieves good generalizability.
基金Supported by the National Natural Science Foundation of China(61462048)
文摘The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representation-based median filter as the in-loop filter in depth video codec. It constructs depth candidate set which contains relevant neighboring depth pixel based on depth and intensity similarity weighted sparse coding, then the median operation is performed on this set to select a neighboring depth pixel as the result of the filtering. The experimental results indicate that the depth bitrate is reduced by about 9% compared with anchor method. It is confirmed that the proposed method is more effective in reducing the required depth bitrates for a given synthesis quality level.
基金This work was supported in part by the National Natural Science Foundation of China(62171317 and 62122058).
文摘Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,while geometry-based methods have difficulties in synthesizing detailed textures.In this paper,we propose STATE,an end-to-end deep neural network,for sparse view synthesis by learning structure and texture representations.Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions,and texture is encoded as a deformed feature map to preserve detailed textures.We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation,in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps.By decoding the aggregated features,STATE is able to generate realistic images with reasonable structures and detailed textures.Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods.Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture.Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE.