Multi-core optimization for conjugate gradient benchmark on heterogeneous processors

Multi-core optimization for conjugate gradient benchmark on heterogeneous processors

下载PDF

导出

摘要 Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores. Developing parallel applications on heterogeneous processors is facing the challenges of ＇memory wall＇, due to limited capacity of local storage, limited bandwidth and long latency for memory access. Aiming at this problem, a parallelization approach was proposed with six memory optimization schemes for CG, four schemes of them aiming at all kinds of sparse matrix-vector multiplication （SPMV） operation. Conducted on IBM QS20, the parallelization approach can reach up to 21 and 133 times speedups with size A and B, respectively, compared with single power processor element. Finally, the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV, simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.

作者邓林窦勇

机构地区 National Laboratory for Parallel and Distributed Processing

出处《Journal of Central South University》 SCIE EI CAS 2011年第2期490-498,共9页 中南大学学报（英文版）

基金 Project(2008AA01A201) supported the National High-tech Research and Development Program of China Projects(60833004, 60633050) supported by the National Natural Science Foundation of China

关键词异构处理器优化方案共轭梯度基准多核应用程序开发内存访问并行化 multi-core processor NAS parallelization CG memory optimization

分类号 TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献18

1PETRINI F, FOSSUM G, FERNANDEZ J, KISTLER M, PERRONE M. Multicore surprise: Lessons learned from optimizing sweep3D on the cell broadband engine [C]//IPDPS. California, 2007 : 62.
2BADER D, AGARWAL V, MADDURI K. On the design and analysis of irregular algorithms on the cell processor: A case study of list ranking [C]//1PDPS. California, 2007: 76.
3SARANTA S, REVATHI S, CHITRA P. Scheduling independent tasks on heterogeneous distributed computing systems using multiobjective optimization approach on multicore processors [C]// ACT. 2009: 481-483.
4CARTER J, HSIEH W, SWANSON M, ZHANG Li-xin. Memory system support for irregular applications [J]. LNCS, 1998, 1511: 17-26.
5CARTER J, HSIEH W, SWANSON M, ZHANG Li-xin. Impulse: Building a smarter memory controller [C]// HPCA. Orlando, 1999: 70-79.
6KIM D, CHAUDHURI M. Architectural support for uniprocessor and multiprocessor active memory systems [J]. IEEE Transactions on Computers, 2004, 53(3): 288-307.
7MORRIS G, PRASANNA V. Sparse matrix computations on reconflgurable hardware [J]. IEEE Transactions on Computers, 2007, 40(3): 58-64.
8HEATH, P1NAR A, MICHAEL T. Improving performance of sparse matrix-vector Multiplication [C]// ACM/IEEE SC Conference. Portland, 1999: 30.
9STATHIS P, VASSILIADIS S, COTIFANA S. A hierarchical sparse matrix storage format for vector Processors [C]//IPDPS. Santa Fe, 2004: 61a.
10AZEVEDO E, FAHEY M, MILLS R. Vectorized sparse matrix multiply for compressed row storage format [C]// ICCS. Atlanta, 2005: 99-106.

1申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(4):607-614. 被引量：4
2陈铭,陈俊.基于单片机AT89C51的数据采集系统设计[J].中国水运（下半月）,2008,8(10):132-133.
3陆小虎,于东,胡毅,林立明.基于异构多核处理器的嵌入式数控系统研究[J].中国机械工程,2013,24(19):2623-2628. 被引量：10
4硬件店[J].大众软件,2011(19):65-66.
5覃特.发布周年APU已经影响了整个行业[J].电脑时空,2012(3):19-19.
6AMD推最新型号加速处理器APU[J].电子商务,2011,12(7):2-2.
7APU[J].个人电脑,2014(5):81-81.
8海外视点[J].微型计算机,2011(31):150-151.
9AMD推出最新APU比英特尔凌动CPU芯片更小[J].电子质量,2011(2):39-39.
10魔之左手.APU全球销量突破3000万颗[J].大众软件,2012(8):67-67.

Journal of Central South University

2011年第2期

浏览历史

内容加载中请稍等...

Multi-core optimization for conjugate gradient benchmark on heterogeneous processors

参考文献18

相关作者

相关机构

相关主题

浏览历史