期刊文献+

Cholesky分解细粒度并行算法 被引量:6

A Fine-Grained Parallel Algorithm for the Cholesky Decomposition
下载PDF
导出
摘要 本文提出了一种Cholesky分解细粒度流水线并行算法,该算法可以处理任意规模的数据,可以充分开发FP-GA加速器提供的细粒度并行。实验表明,该算法具有很好的可扩展性,在Xilinx XC5 VLX330 FPGA上能够集成36个处理单元(PE),当矩阵的阶为16384、运行频率为200MHz时性能达到14.3GFLOPS。 This paper presents a fine-grained pipeline parallel algorithm for the Cholesky decomposition,which is applicable to the matrices of arbitrary orders and can exploit fine-grained parallelism of the FPGA accelerators. The experimental results show this algorithm has good scalability. 36 processing elements (PEs) can be integrated into a Xilinx XC5VLX330 FPGA,achieving a performance of 14.3 Gflops when the matrix order is 16 384 at the clock speed of 200 MHz.
出处 《计算机工程与科学》 CSCD 北大核心 2010年第9期102-106,164,共6页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60633050,60833004)
关键词 CHOLESKY分解 细粒度并行 FPGA Cholesky decomposition fine-grained parallelism FPGA
  • 相关文献

参考文献13

  • 1Anderson E,Bai Z,Bischof C,et al.LAPACK Users' Guide[M].3rd ed.Philadelphia,PA:SIAM,1999.
  • 2Blackford L S,Choi J,Cleary A,et al.ScaLAPACK Users' Guide[M] ,Philadelphia,PA:SIAM,1997.
  • 3Kurzak J,Buttari A,Dongarra J.Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization[R].University of Tennessee Knoxville,LAPACK Working Note 184,2007.
  • 4Baboulin M,Dongarra J,Tomov S.Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures[R].University of Tennessee Knoxville,LAPACK Working Note 200,2008.
  • 5Ltaief H,Tomov S,Nath R,et al.A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators[R].University of Tennessee Knoxville,LAPACK Working Note 223,2009.
  • 6Zhuo L,Prasanna V K.High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware[J].IEEE Trans on Computers,2008,57(8):1057-1071.
  • 7Buttari A,Langou J,Kurzak J,et al.A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures[R].University of Tennessee Knoxville,LAPACK Working Note 191,2007.
  • 8Hogg J D.A DAG-Based Parallel Cholesky Factorization for Multicore Systems[R].Technical Report RAL-TR-2008-029,Computational Science and Engineering Department,Rutherford Appleton Laboratory,2008.
  • 9Maslennikow O,Lepekha V,Sergiyenko A,et al.Parallel Implementation of Cholesky LLT-Algorithm in FPGA-Based Processor[C] ∥Proc of PPAM'07,2008:137-147.
  • 10Haridas S G.FPGA Implementation of a Cholesky Algorithm for a Shared-Memory Multiprocessor Architecture:[MS Thesis] [D].New Jersey Institute of Technology,Department of Electrical and Computer Engineering,2003.

同被引文献45

  • 1郭磊,唐玉华,周杰,董亚卓.基于FPGA的Cholesky分解细粒度并行结构与实现[J].计算机研究与发展,2011,48(S1):258-265. 被引量:4
  • 2LIU Ye1,YU AnXi1,ZHU JuBo2 & LIANG DianNong1 1 College of Electronic Science and Engineering,National University of Defense Technology,Changsha 410073,China,2 Science College,National University of Defense Technology,Changsha 410073,China.Unscented Kalman filtering in the additive noise case[J].Science China(Technological Sciences),2010,53(4):929-941. 被引量:18
  • 3吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:225
  • 4周伟明.多核计算与程序设计[M].武汉:华中科技大学出版社,2008.
  • 5傅永峰,苏宏业,褚健.MIMO Soft-sensor Model of Nutrient Content for Compound Fertil- izer Based on Hybrid Modeling Technique[J].Chinese Journal of Chemical Engineering,2007,15(4):554-559. 被引量:6
  • 6LI Ni,GONG Guanghong,PENG Xiaoyuan,et al.Scene matching algorithm evaluation based on multi-core parallel computing technology [C]. Proceedings of WR/World Congress on Soft- ware Engineering. Washington, DC: IEEE Computer Society, 2009:94-98.
  • 7Marowka A.Towards high level parallel programming models for multi-core systems [C]. Proceedings of Advanced Software Engineering and Its Applications. Washington, DC:IEEE Com- puter Society,2008:226-229.
  • 8Maslennikow O,Lepekha V, Sergiyenko A,et al.Parallel imple- mentation of Cholesky LLT algorithm in FPGA2 based pro- cessor[C]. Proc of PPAM 07,2008:137-147.
  • 9Haridas S G.FPGA implementation ofa Cholesky algorithm for a shared memory multiprocessor architecture[D].New Jersey:In- stitute of Technology, Department of Electrical and Computer Engineering,2003.
  • 10英特尔@软件网络[EB/OL].http://soffware.intel.com/en-us/in-tel-parallel-studio-home/IntelParallelStudio,2011.

引证文献6

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部