Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits...Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits their reconstruction perform-ance in man-made scenes.In this paper,we propose PlaneStereo,a new framework that utilizes planar prior to facilitate the depth estim-ation.Our key intuition is that pixels inside a plane share the same set of plane parameters,which can be estimated collectively using in-formation inside the whole plane.Specifically,our method first segments planes in the reference image,and then fits 3D plane paramet-ers for each segmented plane by solving a linear system using high-confidence depth predictions inside the plane.This allows us to recov-er the plane parameters accurately,which can be converted to accurate depth values for each point in the plane,improving the depth prediction for low-textured local regions.This process is fully differentiable and can be integrated into existing learning-based MVS al-gorithms.Experiments show that using our method consistently improves the performance of existing stereo matching and MVS al-gorithms on DeMoN and ScanNet datasets,achieving state-of-the-art performance.展开更多
In multi-view stereo,unreliable matching in low-textured regions has a negative impact on the completeness of reconstructed models.Since the photometric consistency of low-textured regions is not discriminative under ...In multi-view stereo,unreliable matching in low-textured regions has a negative impact on the completeness of reconstructed models.Since the photometric consistency of low-textured regions is not discriminative under a local window,non-local information provided by the Markov Random Field(MRF)model can alleviate the matching ambiguity but is limited in continuous space with high computational complexity.Owing to its sampling and propagation strategy,PatchMatch multi-view stereo methods have advantages in terms of optimizing the continuous labeling problem.In this paper,we propose a novel method to address this problem,namely the Coarse-Hypotheses Guided Non-Local PAtchMatch Multi-View Stereo(CNLPA-MVS),which takes the advantages of both MRF-based non-local methods and PatchMatch multi-view stereo and compensates for their defects mutually.First,we combine dynamic programing(DP)and sequential propagation along scanlines in parallel to perform CNLPA-MVS,thereby obtaining the optimal depth and normal hypotheses.Second,we introduce coarse inference within a universal window provided by winner-takes-all to eliminate the stripe artifacts caused by DP and improve completeness.Third,we add a local consistency strategy based on the hypotheses of similar color pixels sharing approximate values into CNLPA-MVS for further improving completeness.CNLPA-MVS was validated on public benchmarks and achieved state-of-the-art performance with high completeness.展开更多
In this paper,we present a practical method for reconstructing the bidirectional reflectance distribution function(BRDF)from multiple images of a real object composed of a homogeneous material.The key idea is that the...In this paper,we present a practical method for reconstructing the bidirectional reflectance distribution function(BRDF)from multiple images of a real object composed of a homogeneous material.The key idea is that the BRDF can be sampled after geometry estimation using multi-view stereo(MVS)techniques.Our contribution is selection of reliable samples of lighting,surface normal,and viewing directions for robustness against estimation errors of MVS.Our method is quantitatively evaluated using synthesized images and its effectiveness is shown via real-world experiments.展开更多
Visual SLAM methods usually presuppose that the scene is static, so the SLAM algorithm formobile robots in dynamic scenes often results in a signicant decrease in accuracy due to thein°uence of dynamic objects. I...Visual SLAM methods usually presuppose that the scene is static, so the SLAM algorithm formobile robots in dynamic scenes often results in a signicant decrease in accuracy due to thein°uence of dynamic objects. In this paper, feature points are divided into dynamic and staticfrom semantic information and multi-view geometry information, and then static region featurepoints are added to the pose-optimization, and static scene maps are established for dynamicscenes. Finally, experiments are conducted in dynamic scenes using the KITTI dataset, and theresults show that the proposed algorithm has higher accuracy in highly dynamic scenes comparedto the visual SLAM baseline.展开更多
Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization,which limits their practical applications.We propose a generalization method to infer scenes from ...Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization,which limits their practical applications.We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF(Sparse-Input Generalized Neural Radiance Fields).Firstly,we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images,and then these features are aggregated by multi-head attention as the input of the neural radiance fields.This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference,thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes.We tested the generalization ability on DTU dataset,and our PSNR(peak signal-to-noise ratio)improved by 3.14 compared with the baseline method under the same input conditions.In addition,if the scene has dense input views available,the average PSNR can be improved by 1.04 through further refinement training in a short time,and a higher quality rendering effect can be obtained.展开更多
Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseli...Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.展开更多
文摘Learning-based multi-view stereo(MVS)algorithms have demonstrated great potential for depth estimation in recent years.However,they still struggle to estimate accurate depth in texture-less planar regions,which limits their reconstruction perform-ance in man-made scenes.In this paper,we propose PlaneStereo,a new framework that utilizes planar prior to facilitate the depth estim-ation.Our key intuition is that pixels inside a plane share the same set of plane parameters,which can be estimated collectively using in-formation inside the whole plane.Specifically,our method first segments planes in the reference image,and then fits 3D plane paramet-ers for each segmented plane by solving a linear system using high-confidence depth predictions inside the plane.This allows us to recov-er the plane parameters accurately,which can be converted to accurate depth values for each point in the plane,improving the depth prediction for low-textured local regions.This process is fully differentiable and can be integrated into existing learning-based MVS al-gorithms.Experiments show that using our method consistently improves the performance of existing stereo matching and MVS al-gorithms on DeMoN and ScanNet datasets,achieving state-of-the-art performance.
基金supported by the National Natural Science Foundation of China under Grant Nos.61732015,61932018,and 61472349the National Key Research and Development Program of China under Grant No.2017YFB0202203.
文摘In multi-view stereo,unreliable matching in low-textured regions has a negative impact on the completeness of reconstructed models.Since the photometric consistency of low-textured regions is not discriminative under a local window,non-local information provided by the Markov Random Field(MRF)model can alleviate the matching ambiguity but is limited in continuous space with high computational complexity.Owing to its sampling and propagation strategy,PatchMatch multi-view stereo methods have advantages in terms of optimizing the continuous labeling problem.In this paper,we propose a novel method to address this problem,namely the Coarse-Hypotheses Guided Non-Local PAtchMatch Multi-View Stereo(CNLPA-MVS),which takes the advantages of both MRF-based non-local methods and PatchMatch multi-view stereo and compensates for their defects mutually.First,we combine dynamic programing(DP)and sequential propagation along scanlines in parallel to perform CNLPA-MVS,thereby obtaining the optimal depth and normal hypotheses.Second,we introduce coarse inference within a universal window provided by winner-takes-all to eliminate the stripe artifacts caused by DP and improve completeness.Third,we add a local consistency strategy based on the hypotheses of similar color pixels sharing approximate values into CNLPA-MVS for further improving completeness.CNLPA-MVS was validated on public benchmarks and achieved state-of-the-art performance with high completeness.
基金partly supported by JSPS KAKENHI JP15K16027,JP26700013,JP15H05918,JP19H04138,JST CREST JP179423the Foundation for Nara Institute of Science and Technology.
文摘In this paper,we present a practical method for reconstructing the bidirectional reflectance distribution function(BRDF)from multiple images of a real object composed of a homogeneous material.The key idea is that the BRDF can be sampled after geometry estimation using multi-view stereo(MVS)techniques.Our contribution is selection of reliable samples of lighting,surface normal,and viewing directions for robustness against estimation errors of MVS.Our method is quantitatively evaluated using synthesized images and its effectiveness is shown via real-world experiments.
基金the National Natural Science Foundation of China(U21A20487)Shenzhen Technology Project(JCYJ20180507182610734)and CAS Key Technology Talent Program.
文摘Visual SLAM methods usually presuppose that the scene is static, so the SLAM algorithm formobile robots in dynamic scenes often results in a signicant decrease in accuracy due to thein°uence of dynamic objects. In this paper, feature points are divided into dynamic and staticfrom semantic information and multi-view geometry information, and then static region featurepoints are added to the pose-optimization, and static scene maps are established for dynamicscenes. Finally, experiments are conducted in dynamic scenes using the KITTI dataset, and theresults show that the proposed algorithm has higher accuracy in highly dynamic scenes comparedto the visual SLAM baseline.
基金supported by the Zhengzhou Collaborative Innovation Major Project under Grant No.20XTZX06013the Henan Provincial Key Scientific Research Project of China under Grant No.22A520042。
文摘Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization,which limits their practical applications.We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF(Sparse-Input Generalized Neural Radiance Fields).Firstly,we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images,and then these features are aggregated by multi-head attention as the input of the neural radiance fields.This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference,thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes.We tested the generalization ability on DTU dataset,and our PSNR(peak signal-to-noise ratio)improved by 3.14 compared with the baseline method under the same input conditions.In addition,if the scene has dense input views available,the average PSNR can be improved by 1.04 through further refinement training in a short time,and a higher quality rendering effect can be obtained.
基金the Theme-based Research Scheme,Research Grants Council of Hong Kong(No.T45-205/21-N).
文摘Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.