摘要
在OpenCL并行计算框架的clMAGMA库中,Cholesky分解算法采用大尺寸分块并行方法,不能充分利用GPU的高速局部存储器,且在计算过程中存在多次GPU-CPU间的数据传递。为此,提出采用小尺寸分块并行方法,充分利用GPU中的高速局部存储器,使矩阵子块的逆矩阵得到复用,完成对称正定矩阵的高效Cholesky分解,并且其能够应用于三维视觉光束平差问题中的大型正定矩阵的分解。实验结果表明,该方法的Cholesky分解速度比clMAGMA提升50%以上,针对光束平差问题,比Ceres Solver中使用的Eigen库速度提升约38倍。
In the clMAGMA library of OpenCL parallel computing framework,the large size block parallel method is used in the Cholesky decomposition algorithm,which can not make full use of the high speed local memory of GPU,and there are many data transfers between GPU-CPU in the calculation process.To solve this problem,a small size block parallel method is proposed.By making full use of the high speed local memory in GPU,the inverse matrix of matrixsubblock is multiplexed,and the efficient Cholesky decomposition of symmetric positive definite matrix is completed,and it can be applied to the decomposition of large positive definite matrix in the problem of three-dimensional vision bundle adjustment.Experimental results show that the speed of Cholesky decomposition is more than 50 % higher than that of clMAGMA,and in bundle adjustment problem,the speed is 38 times faster than the Eigen library used in Ceres Solver.
作者
沈雁
戴瑜兴
SHEN Yan;DAI Yuxing(College of Electrical and Information Engineering,Hunan University,Changsha 410082,China;College of Mathematics,Physics and Electronic Information Engineering,Wenzhou University,Wenzhou,Zhejiang 325035,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第2期284-289,共6页
Computer Engineering
基金
浙江省自然科学基金重点项目(LZ16E050002)