期刊文献+

Multi-core optimization for conjugate gradient benchmark on heterogeneous processors

Multi-core optimization for conjugate gradient benchmark on heterogeneous processors
下载PDF
导出
摘要 Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores. Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall', due to limited capacity of local storage, limited bandwidth and long latency for memory access. Aiming at this problem, a parallelization approach was proposed with six memory optimization schemes for CG, four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20, the parallelization approach can reach up to 21 and 133 times speedups with size A and B, respectively, compared with single power processor element. Finally, the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV, simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
作者 邓林 窦勇
出处 《Journal of Central South University》 SCIE EI CAS 2011年第2期490-498,共9页 中南大学学报(英文版)
基金 Project(2008AA01A201) supported the National High-tech Research and Development Program of China Projects(60833004, 60633050) supported by the National Natural Science Foundation of China
关键词 异构处理器 优化方案 共轭梯度 基准 多核 应用程序开发 内存访问 并行化 multi-core processor NAS parallelization CG memory optimization
  • 相关文献

参考文献18

  • 1PETRINI F, FOSSUM G, FERNANDEZ J, KISTLER M, PERRONE M. Multicore surprise: Lessons learned from optimizing sweep3D on the cell broadband engine [C]//IPDPS. California, 2007 : 62.
  • 2BADER D, AGARWAL V, MADDURI K. On the design and analysis of irregular algorithms on the cell processor: A case study of list ranking [C]//1PDPS. California, 2007: 76.
  • 3SARANTA S, REVATHI S, CHITRA P. Scheduling independent tasks on heterogeneous distributed computing systems using multiobjective optimization approach on multicore processors [C]// ACT. 2009: 481-483.
  • 4CARTER J, HSIEH W, SWANSON M, ZHANG Li-xin. Memory system support for irregular applications [J]. LNCS, 1998, 1511: 17-26.
  • 5CARTER J, HSIEH W, SWANSON M, ZHANG Li-xin. Impulse: Building a smarter memory controller [C]// HPCA. Orlando, 1999: 70-79.
  • 6KIM D, CHAUDHURI M. Architectural support for uniprocessor and multiprocessor active memory systems [J]. IEEE Transactions on Computers, 2004, 53(3): 288-307.
  • 7MORRIS G, PRASANNA V. Sparse matrix computations on reconflgurable hardware [J]. IEEE Transactions on Computers, 2007, 40(3): 58-64.
  • 8HEATH, P1NAR A, MICHAEL T. Improving performance of sparse matrix-vector Multiplication [C]// ACM/IEEE SC Conference. Portland, 1999: 30.
  • 9STATHIS P, VASSILIADIS S, COTIFANA S. A hierarchical sparse matrix storage format for vector Processors [C]//IPDPS. Santa Fe, 2004: 61a.
  • 10AZEVEDO E, FAHEY M, MILLS R. Vectorized sparse matrix multiply for compressed row storage format [C]// ICCS. Atlanta, 2005: 99-106.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部