期刊文献+

面向FT-M7002的一种Cholesky分解向量处理算法 被引量:1

A Cholesky decomposition vector processing algorithm for FT-M7002
下载PDF
导出
摘要 FT-M7002是我国自主研发的高性能DSP,具有强大的向量处理能力。为有效发挥它的性能优势,亟待解决面向FT-M7002的高效矩阵分解算法。Cholesky分解是针对对称正定矩阵的一种快速分解方式,在FT-M7002处理器上研究优化Cholesky分解算法,通过生成上三角矩阵代替下三角矩阵、手工向量化、循环合并、循环展开和软件流水等手段提高算法性能。结果表明:优化后的算法相对于对应的TI库函数获得了1.90~2.82的加速比,在使用循环展开和软件流水等循环优化方法后相对于对应的TI库函数获得了3.29~7.01的加速比,加速效果较为明显。 FT-M7002 is a high-performance DSP independently developed in China.In order to give full play to the performance advantages of this processor,the problems to be solved urgently is the implementation of an efficient matrix decomposition algorithm for the FT-M7002.Cholesky decomposition is a fast decomposition method for symmetric positive definite matrices.The optimization of the Cholesky decomposition algorithm is investigated on the FT-M7002 processor to improve the performance of the algorithm by generating upper triangular matrices instead of lower triangular matrices,vector parallel optimization,loop merging,loop unrolling and software pipelining.The results show that the optimized algorithm achieves the speedup ratio of 1.9~2.82 compared with the corresponding TI library function,and achieves the speedup ratio of 3.29~7.01 compared with the corresponding TI library function after using the cyclic optimization methods such as loop unrolling and software pipelining,and the speedup effect is obvious.
作者 李慧祥 张会福 LI Huixiang;ZHANG Huifu(School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China;Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China)
出处 《邵阳学院学报(自然科学版)》 2022年第3期9-17,共9页 Journal of Shaoyang University:Natural Science Edition
基金 湖南省教育厅科研项目(20B242)。
关键词 CHOLESKY分解 数字信号处理器 单指令多数据流 Cholesky decomposition digital signal processor single instruction multiple data
  • 相关文献

参考文献9

二级参考文献61

  • 1郭磊,唐玉华,周杰,董亚卓.基于FPGA的Cholesky分解细粒度并行结构与实现[J].计算机研究与发展,2011,48(S1):258-265. 被引量:4
  • 2Anderson E,Bai Z,Bischof C,et al.LAPACK Users' Guide[M].3rd ed.Philadelphia,PA:SIAM,1999.
  • 3Blackford L S,Choi J,Cleary A,et al.ScaLAPACK Users' Guide[M] ,Philadelphia,PA:SIAM,1997.
  • 4Kurzak J,Buttari A,Dongarra J.Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization[R].University of Tennessee Knoxville,LAPACK Working Note 184,2007.
  • 5Baboulin M,Dongarra J,Tomov S.Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures[R].University of Tennessee Knoxville,LAPACK Working Note 200,2008.
  • 6Ltaief H,Tomov S,Nath R,et al.A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators[R].University of Tennessee Knoxville,LAPACK Working Note 223,2009.
  • 7Zhuo L,Prasanna V K.High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware[J].IEEE Trans on Computers,2008,57(8):1057-1071.
  • 8Buttari A,Langou J,Kurzak J,et al.A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures[R].University of Tennessee Knoxville,LAPACK Working Note 191,2007.
  • 9Hogg J D.A DAG-Based Parallel Cholesky Factorization for Multicore Systems[R].Technical Report RAL-TR-2008-029,Computational Science and Engineering Department,Rutherford Appleton Laboratory,2008.
  • 10Maslennikow O,Lepekha V,Sergiyenko A,et al.Parallel Implementation of Cholesky LLT-Algorithm in FPGA-Based Processor[C] ∥Proc of PPAM'07,2008:137-147.

共引文献29

同被引文献3

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部