Simultaneous location and mapping(SLAM)plays the crucial role in VR/AR application,autonomous robotics navigation,UAV remote control,etc.The traditional SLAM is not good at handle the data acquired by camera with fast...Simultaneous location and mapping(SLAM)plays the crucial role in VR/AR application,autonomous robotics navigation,UAV remote control,etc.The traditional SLAM is not good at handle the data acquired by camera with fast movement or severe jittering,and the efficiency need to be improved.The paper proposes an improved SLAM algorithm,which mainly improves the real-time performance of classical SLAM algorithm,applies KDtree for efficient organizing feature points,and accelerates the feature points correspondence building.Moreover,the background map reconstruction thread is optimized,the SLAM parallel computation ability is increased.The color images experiments demonstrate that the improved SLAM algorithm holds better realtime performance than the classical SLAM.展开更多
Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes o...Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.展开更多
Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing de...Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information,traditional approaches often produce low-quality geometry with holes,bumps,and misalignments.We propose a novel 3D dynamic reconstruction system,named HDR-Net-Fusion,which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels,using a hierarchical deep reinforcement(HDR)network.The latter comprises two parts:a global HDR-Net which rapidly detects local regions with large geometric errors,and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions.Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality.The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset.Our method can reconstruct geometry with higher quality than traditional methods.展开更多
Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one ...Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one of the excellent open source algorithms in visual SLAM system,which is often used in indoor scene reconstruction.However,it is time-consuming and can only build sparse scene map by using ORB features to solve camera pose.In view of the shortcomings of ORB-SLAM2 method,this article proposes an improved ORB-SLAM2 solution,which uses a direct method based on light intensity to solve the camera pose.It can greatly reduce the amount of computation,the speed is significantly improved by about 5 times compared with the ORB feature method.A parallel thread of map reconstruction is added with surfel model,and depth map and RGB map are fused to build the dense map.A Realsense D415 sensor is used as RGB-D cameras to obtain the three-dimensional(3D)point clouds of an indoor environments.After calibration and alignment processing,the sensor is applied in the reconstruction experiment of indoor scene with the improved ORB-SLAM2 method.Results show that the improved ORB-SLAM2 algorithm cause a great improvement in processing speed and reconstructing density of scenes.展开更多
The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its p...The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.展开更多
Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and obse...Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and observed projection of the image points formulated as a non-linear least-square problem. Levenberg-Marquardt method is used to solve the non-linear least-square problem. Solving the non-linear least-square problem is computationally expensive, proportional to the number of cameras, points, and projections. In this paper, we implement the Bundle Adjustment (BA) algorithm and analyze techniques to improve algorithmic performance by reducing the mean square error. We investigate using an additional radial distortion camera parameter in the BA algorithm and demonstrate better convergence of the mean square error. We also demonstrate the use of explicitly computed analytical derivatives. In addition, we implement the BA algorithm on GPUs using the CUDA parallel programming model to reduce the computational time burden of the BA algorithm. CUDA Streams, atomic operations, and cuBLAS library in the CUDA programming model are proposed, implemented, and demonstrated to improve the performance of the BA algorithm. Our implementation has demonstrated better convergence of the BA algorithm and achieved a speedup of up to 16× on the use of the BA algorithm on various datasets.展开更多
A new large-scale three-dimensional(3D) reconstruction technology based on integral imaging with color-position characteristics is presented.The color of the object point is similar to those of corresponding points.Th...A new large-scale three-dimensional(3D) reconstruction technology based on integral imaging with color-position characteristics is presented.The color of the object point is similar to those of corresponding points.The corresponding point coordinates form arithmetic progressions because integral imaging captures information with a senior array which has similar pitches on x and y directions.This regular relationship is used to determine the corresponding point parameters for reconstructing 3D information from divided elemental images separated by color,which contain several corresponding points.The feasibility of the proposed method is demonstrated through an optical indoor experiment.A large-scale application of the proposed method is illustrated by the experiment with a corner of our school as its object.展开更多
Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains.We present a method to complete sparse/semi-dense,noisy,and potentially low-resolution dep...Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains.We present a method to complete sparse/semi-dense,noisy,and potentially low-resolution depth maps obtained by various range sensors,including those in modern mobile phones,or by multi-view reconstruction algorithms.Our method leverages a data-driven prior in the form of a single image depth prediction network trained on large-scale datasets,the output of which is used as an input to our model.We propose an effective training scheme where we simulate various sparsity patterns in typical task domains.In addition,we design two new benchmarks to evaluate the generalizability and robustness of depth completion methods.Our simple method shows superior cross-domain generalization ability against state-of-the-art depth completion methods,introducing a practical solution to highqualitydepthcapture onamobile device.展开更多
Mixed reality technologies provide real-time and immersive experiences,which bring tremendous opportunities in entertainment,education,and enriched experiences that are not directly accessible owing to safety or cost....Mixed reality technologies provide real-time and immersive experiences,which bring tremendous opportunities in entertainment,education,and enriched experiences that are not directly accessible owing to safety or cost.The research in this field has been in the spotlight in the last few years as the metaverse went viral.The recently emerging omnidirectional video streams,i.e.,360°videos,provide an affordable way to capture and present dynamic real-world scenes.In the last decade,fueled by the rapid development of artificial intelligence and computational photography technologies,the research interests in mixed reality systems using 360°videos with richer and more realistic experiences are dramatically increased to unlock the true potential of the metaverse.In this survey,we cover recent research aimed at addressing the above issues in the 360°image and video processing technologies and applications for mixed reality.The survey summarizes the contributions of the recent research and describes potential future research directions about 360°media in the field of mixed reality.展开更多
A background removal method based on two-dimensional notch filtering in the frequency domain for polarization interference imaging spectrometers(PIISs) is implemented. According to the relationship between the spati...A background removal method based on two-dimensional notch filtering in the frequency domain for polarization interference imaging spectrometers(PIISs) is implemented. According to the relationship between the spatial domain and the frequency domain, the notch filter is designed with several parameters of PIISs, and the interferogram without a background is obtained. Both the simulated and the experimental results demonstrate that the background removal method is feasible and robust with a high processing speed. In addition, this method can reduce the noise level of the reconstructed spectrum, and it is insusceptible to a complicated background, compared with the polynomial fitting and empirical mode decomposition(EMD) methods.展开更多
基金This work is supported by the National Natural Science Foundation of China(Grant No.61672279)Project of“Six Talents Peak”in Jiangsu(2012-WLW-023)Open Foundation of State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering,Nanjing Hydraulic Research Institute,China(2016491411).
文摘Simultaneous location and mapping(SLAM)plays the crucial role in VR/AR application,autonomous robotics navigation,UAV remote control,etc.The traditional SLAM is not good at handle the data acquired by camera with fast movement or severe jittering,and the efficiency need to be improved.The paper proposes an improved SLAM algorithm,which mainly improves the real-time performance of classical SLAM algorithm,applies KDtree for efficient organizing feature points,and accelerates the feature points correspondence building.Moreover,the background map reconstruction thread is optimized,the SLAM parallel computation ability is increased.The color images experiments demonstrate that the improved SLAM algorithm holds better realtime performance than the classical SLAM.
基金This work was supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61902210 and 61521002).
文摘Reconstructing dynamic scenes with commodity depth cameras has many applications in computer graphics,computer vision,and robotics.However,due to the presence of noise and erroneous observations from data capturing devices and the inherently ill-posed nature of non-rigid registration with insufficient information,traditional approaches often produce low-quality geometry with holes,bumps,and misalignments.We propose a novel 3D dynamic reconstruction system,named HDR-Net-Fusion,which learns to simultaneously reconstruct and refine the geometry on the fly with a sparse embedded deformation graph of surfels,using a hierarchical deep reinforcement(HDR)network.The latter comprises two parts:a global HDR-Net which rapidly detects local regions with large geometric errors,and a local HDR-Net serving as a local patch refinement operator to promptly complete and enhance such regions.Training the global HDR-Net is formulated as a novel reinforcement learning problem to implicitly learn the region selection strategy with the goal of improving the overall reconstruction quality.The applicability and efficiency of our approach are demonstrated using a large-scale dynamic reconstruction dataset.Our method can reconstruct geometry with higher quality than traditional methods.
基金This work was supported by Henan Province Science and Technology Project under Grant No.182102210065.
文摘Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one of the excellent open source algorithms in visual SLAM system,which is often used in indoor scene reconstruction.However,it is time-consuming and can only build sparse scene map by using ORB features to solve camera pose.In view of the shortcomings of ORB-SLAM2 method,this article proposes an improved ORB-SLAM2 solution,which uses a direct method based on light intensity to solve the camera pose.It can greatly reduce the amount of computation,the speed is significantly improved by about 5 times compared with the ORB feature method.A parallel thread of map reconstruction is added with surfel model,and depth map and RGB map are fused to build the dense map.A Realsense D415 sensor is used as RGB-D cameras to obtain the three-dimensional(3D)point clouds of an indoor environments.After calibration and alignment processing,the sensor is applied in the reconstruction experiment of indoor scene with the improved ORB-SLAM2 method.Results show that the improved ORB-SLAM2 algorithm cause a great improvement in processing speed and reconstructing density of scenes.
文摘The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.
文摘Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and observed projection of the image points formulated as a non-linear least-square problem. Levenberg-Marquardt method is used to solve the non-linear least-square problem. Solving the non-linear least-square problem is computationally expensive, proportional to the number of cameras, points, and projections. In this paper, we implement the Bundle Adjustment (BA) algorithm and analyze techniques to improve algorithmic performance by reducing the mean square error. We investigate using an additional radial distortion camera parameter in the BA algorithm and demonstrate better convergence of the mean square error. We also demonstrate the use of explicitly computed analytical derivatives. In addition, we implement the BA algorithm on GPUs using the CUDA parallel programming model to reduce the computational time burden of the BA algorithm. CUDA Streams, atomic operations, and cuBLAS library in the CUDA programming model are proposed, implemented, and demonstrated to improve the performance of the BA algorithm. Our implementation has demonstrated better convergence of the BA algorithm and achieved a speedup of up to 16× on the use of the BA algorithm on various datasets.
基金supported by the National Natural Science Foundation of China(No.11474169)
文摘A new large-scale three-dimensional(3D) reconstruction technology based on integral imaging with color-position characteristics is presented.The color of the object point is similar to those of corresponding points.The corresponding point coordinates form arithmetic progressions because integral imaging captures information with a senior array which has similar pitches on x and y directions.This regular relationship is used to determine the corresponding point parameters for reconstructing 3D information from divided elemental images separated by color,which contain several corresponding points.The feasibility of the proposed method is demonstrated through an optical indoor experiment.A large-scale application of the proposed method is illustrated by the experiment with a corner of our school as its object.
文摘Existing depth completion methods are often targeted at a specific sparse depth type and generalize poorly across task domains.We present a method to complete sparse/semi-dense,noisy,and potentially low-resolution depth maps obtained by various range sensors,including those in modern mobile phones,or by multi-view reconstruction algorithms.Our method leverages a data-driven prior in the form of a single image depth prediction network trained on large-scale datasets,the output of which is used as an input to our model.We propose an effective training scheme where we simulate various sparsity patterns in typical task domains.In addition,we design two new benchmarks to evaluate the generalizability and robustness of depth completion methods.Our simple method shows superior cross-domain generalization ability against state-of-the-art depth completion methods,introducing a practical solution to highqualitydepthcapture onamobile device.
基金supported by the Marsden Fund Council managed by Royal Society of New Zealand under Grant Nos.MFP-20-VUW-180 and UOO1724Zhejiang Province Public Welfare Technology Application Research under Grant No.LGG22F020009the Key Lab of Film and TV Media Technology of Zhejiang Province of China under Grant No.2020E10015.
文摘Mixed reality technologies provide real-time and immersive experiences,which bring tremendous opportunities in entertainment,education,and enriched experiences that are not directly accessible owing to safety or cost.The research in this field has been in the spotlight in the last few years as the metaverse went viral.The recently emerging omnidirectional video streams,i.e.,360°videos,provide an affordable way to capture and present dynamic real-world scenes.In the last decade,fueled by the rapid development of artificial intelligence and computational photography technologies,the research interests in mixed reality systems using 360°videos with richer and more realistic experiences are dramatically increased to unlock the true potential of the metaverse.In this survey,we cover recent research aimed at addressing the above issues in the 360°image and video processing technologies and applications for mixed reality.The survey summarizes the contributions of the recent research and describes potential future research directions about 360°media in the field of mixed reality.
基金supported by the Major Program of the National Natural Science Foundation of China(No.41530422)the National Science and Technology Major Project of the Ministry of Science and Technology of China(No.32-Y30B08-9001-13/15)+1 种基金the National Natural Science Foundation of China(Nos.61275184,61540018,61405153,and 60278019)the National High Technology Research and Development Program of China(No.2012AA121101)
文摘A background removal method based on two-dimensional notch filtering in the frequency domain for polarization interference imaging spectrometers(PIISs) is implemented. According to the relationship between the spatial domain and the frequency domain, the notch filter is designed with several parameters of PIISs, and the interferogram without a background is obtained. Both the simulated and the experimental results demonstrate that the background removal method is feasible and robust with a high processing speed. In addition, this method can reduce the noise level of the reconstructed spectrum, and it is insusceptible to a complicated background, compared with the polynomial fitting and empirical mode decomposition(EMD) methods.