摘要
本文提出了一种Cholesky分解细粒度流水线并行算法,该算法可以处理任意规模的数据,可以充分开发FP-GA加速器提供的细粒度并行。实验表明,该算法具有很好的可扩展性,在Xilinx XC5 VLX330 FPGA上能够集成36个处理单元(PE),当矩阵的阶为16384、运行频率为200MHz时性能达到14.3GFLOPS。
This paper presents a fine-grained pipeline parallel algorithm for the Cholesky decomposition,which is applicable to the matrices of arbitrary orders and can exploit fine-grained parallelism of the FPGA accelerators. The experimental results show this algorithm has good scalability. 36 processing elements (PEs) can be integrated into a Xilinx XC5VLX330 FPGA,achieving a performance of 14.3 Gflops when the matrix order is 16 384 at the clock speed of 200 MHz.
出处
《计算机工程与科学》
CSCD
北大核心
2010年第9期102-106,164,共6页
Computer Engineering & Science
基金
国家自然科学基金资助项目(60633050,60833004)