摘要
FT-M7002是我国自主研发的高性能DSP,具有强大的向量处理能力。为有效发挥它的性能优势,亟待解决面向FT-M7002的高效矩阵分解算法。Cholesky分解是针对对称正定矩阵的一种快速分解方式,在FT-M7002处理器上研究优化Cholesky分解算法,通过生成上三角矩阵代替下三角矩阵、手工向量化、循环合并、循环展开和软件流水等手段提高算法性能。结果表明:优化后的算法相对于对应的TI库函数获得了1.90~2.82的加速比,在使用循环展开和软件流水等循环优化方法后相对于对应的TI库函数获得了3.29~7.01的加速比,加速效果较为明显。
FT-M7002 is a high-performance DSP independently developed in China.In order to give full play to the performance advantages of this processor,the problems to be solved urgently is the implementation of an efficient matrix decomposition algorithm for the FT-M7002.Cholesky decomposition is a fast decomposition method for symmetric positive definite matrices.The optimization of the Cholesky decomposition algorithm is investigated on the FT-M7002 processor to improve the performance of the algorithm by generating upper triangular matrices instead of lower triangular matrices,vector parallel optimization,loop merging,loop unrolling and software pipelining.The results show that the optimized algorithm achieves the speedup ratio of 1.9~2.82 compared with the corresponding TI library function,and achieves the speedup ratio of 3.29~7.01 compared with the corresponding TI library function after using the cyclic optimization methods such as loop unrolling and software pipelining,and the speedup effect is obvious.
作者
李慧祥
张会福
LI Huixiang;ZHANG Huifu(School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China;Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China)
出处
《邵阳学院学报(自然科学版)》
2022年第3期9-17,共9页
Journal of Shaoyang University:Natural Science Edition
基金
湖南省教育厅科研项目(20B242)。