摘要
Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and observed projection of the image points formulated as a non-linear least-square problem. Levenberg-Marquardt method is used to solve the non-linear least-square problem. Solving the non-linear least-square problem is computationally expensive, proportional to the number of cameras, points, and projections. In this paper, we implement the Bundle Adjustment (BA) algorithm and analyze techniques to improve algorithmic performance by reducing the mean square error. We investigate using an additional radial distortion camera parameter in the BA algorithm and demonstrate better convergence of the mean square error. We also demonstrate the use of explicitly computed analytical derivatives. In addition, we implement the BA algorithm on GPUs using the CUDA parallel programming model to reduce the computational time burden of the BA algorithm. CUDA Streams, atomic operations, and cuBLAS library in the CUDA programming model are proposed, implemented, and demonstrated to improve the performance of the BA algorithm. Our implementation has demonstrated better convergence of the BA algorithm and achieved a speedup of up to 16× on the use of the BA algorithm on various datasets.
Bundle adjustment is a camera and point refinement technique in a 3D scene reconstruction pipeline. The camera parameters and the 3D points are refined by minimizing the difference between computed projection and observed projection of the image points formulated as a non-linear least-square problem. Levenberg-Marquardt method is used to solve the non-linear least-square problem. Solving the non-linear least-square problem is computationally expensive, proportional to the number of cameras, points, and projections. In this paper, we implement the Bundle Adjustment (BA) algorithm and analyze techniques to improve algorithmic performance by reducing the mean square error. We investigate using an additional radial distortion camera parameter in the BA algorithm and demonstrate better convergence of the mean square error. We also demonstrate the use of explicitly computed analytical derivatives. In addition, we implement the BA algorithm on GPUs using the CUDA parallel programming model to reduce the computational time burden of the BA algorithm. CUDA Streams, atomic operations, and cuBLAS library in the CUDA programming model are proposed, implemented, and demonstrated to improve the performance of the BA algorithm. Our implementation has demonstrated better convergence of the BA algorithm and achieved a speedup of up to 16× on the use of the BA algorithm on various datasets.