期刊文献+

迭代方法中基于渐近规模的通信与计算比分析 被引量:6

Analysis of Communication-to-Computation Based on Asymptotic Size for Iterative Methods
下载PDF
导出
摘要 迭代方法是科学计算中求解大规模稀疏线性代数方程组最常用的方法.大量实际应用表明,迭代方法通常具有较高的通信与计算比,只有在粗粒度并行下才能取得较好的并行可扩展性能.而实际应用大规模计算的需求和当前多核/众核体系结构的发展趋势要求迭代方法具备细粒度并行可扩展能力.文中引入渐近规模,即满足加速条件的计算规模下界,来反映并行迭代方法适应细粒度并行的能力,并由此刻画通信与计算比.基于矩阵的稀疏模式及其通信模式、机器的通信参数和迭代方法的基本运算,给出了渐近规模的理论预测公式.在一台包含128个双路4核计算节点的并行机上,分别基于纯进程并行(MPI)和进程/线程混合并行(MPI/OpenMP),以实际应用中3种常用迭代方法Jacobi、CG、BiCGSTAB为例,分析其渐近规模.并行可扩展性测试表明了渐近规模用于刻画迭代方法通信与计算比的准确性.对于纯进程情形,给出了渐近规模的理论预测与实际测试的对比,表明了理论预测结果的正确性.最后,基于这些结果,从迭代方法的算法设计和并行实现等方面讨论了面向未来更大规模的计算系统,降低通信与计算比的途径. Iterative method are one of the most efficient numerical algorithms to solve large-scale sparse linear systems arising from scientific computing. The parallel scalability of iterative method can be measured by the communication-to-computation (CtC) during the iterative process. The CtC is high for many iterative methods, as a result, coarse-grain parallelism is needed in order to obtain expected scalability. However, fine-grain parallelism is required for the complicacy of architecture increasing with multi/many-cores. In this paper, we introduce the concept of asymptotic size, which defined as the low bound of the problem size such that satisfying the speedup condition that parallel speedup is more than 1. We hope the asymptotic size can be used to describe the CtC and the ability of fine-grain parallelism for iterative method. Moreover, the theoretical prediction formula of asymptotic size is obtained based on the following parameters, the sparse and communication pattern of matrix, communication parameters of machine, and combination of basic operations of iterative methods. Using asymptotic size, the CtC is analyzed for three popular iterative methods, including Jacobi, CG, BiCGSTAB, on a MPP machine with 128 Double Quad-core nodes. The performance results are given for both MPI-Only and MPI/ OpenMP programming model, which show the usefulness of the asymptotic size for describing the CtC of iterative methods. For MPI-Only case, we also give the comparison of the prediction results and experiment results, which show the validation of the formula of asymptotic size. Finally, future research topics for improving the scalability of the iterative methods on more power supercomputers also discussed based on the analysis conclusions.
出处 《计算机学报》 EI CSCD 北大核心 2013年第4期782-789,共8页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2011CB309702) 国家"八六三"高技术研究发展计划项目基金(2012AA01A309)资助~~
关键词 迭代方法 通信与计算比 并行可扩展 渐近规模 多核 众核体系结构 iterative method communication-to-computation parallel scalability asymptoticsize multi/many-cores
  • 相关文献

参考文献8

  • 1Saad Y. Iterative Methods for Sparse Linear Systems. 2ndEdition. Philadelphia:SIAM Publisher, 2003.
  • 2Ang J, Evans K, Geist A, Heroux M, Hovland P, MarquesO,Curfman L,Ng E,Wild S. Extreme-scale solvers:Transi-tion to future architectures. DOE ASCR Report 2012-3,2012.
  • 3Gahvari H, Baker H,Schuiz M, Yang U, Jordan K,GroppW. Modeling the performance of an algebraic multigrid cycleon HPC platforms//Proceedings of the 25th InternationalConference on Super Computing (ICS 2011 ). Tucson, AZ,2011:172-181.
  • 4Ghysels P, Ashby T,Meerbergen K, Vanroose W. Hidingglobal communication latency in the GMRES algorithm onmassively parallel machines. Intel Exascience Lab, Leuven,Belgium:Technical Report 04. 2012. 1,2012.
  • 5莫则尧.实用的并行程序性能分析方法[J].数值计算与计算机应用,2000,21(4):266-275. 被引量:6
  • 6徐小文,莫则尧.并行代数多重网格算法可扩展性能分析[J].计算物理,2007,24(4):387-394. 被引量:9
  • 7Mo Ze-Yao, Zhang Ai-Qing, Cao Xiao-Lin, Liu Qing-Kai,Xu Xiao-Wen, An Heng-Bin, Pei Wen-Bin,Zhu Shao-Ping.JASMIN:A parallel software infrastructure for scientificcomputing. Frontiers of Computer Science in China,2010,4(4):480-488.
  • 8Ballard G, Demmel J, Holtz 0? Schwartz O. Minimizingcommunication in numerical linear algebra. SIAM Journal onMatrix Analysis and Applications, 2011, 32 (3):866-901.

二级参考文献5

共引文献13

同被引文献57

  • 1徐小文,莫则尧.一种新的并行代数多重网格粗化算法[J].计算数学,2005,27(3):325-336. 被引量:7
  • 2徐小文,莫则尧.并行代数多重网格算法可扩展性能分析[J].计算物理,2007,24(4):387-394. 被引量:9
  • 3武林平,魏勇,刘旭.多核集群中系统嗓音的测最[C]//2012全国高性能计算学术年会.北京:中国计算机学会,2012:1-5.
  • 4Gioiosa R, Petrini F, Davis K, et al. Analysis of system overhead on parallel computers [C] //Proe of the 4th IEEE Int Symp on Signal Processing and Information Technology. Piscataway, NJ: IEEE, 2004:387-390.
  • 5Beckman P, lskra K, Yoshii K, et al. Benchmarking the effects of operating system interference on extreme-scale parallel machines [J]. Cluster Computing, 2008, 11 (1) : 3- 16.
  • 6Herowx M A. HPCCG; A simple conjugate gradient benchmark code for a 3D chimney domain on an arbitrary numher of processors [CP/OL]. [2014-03-13]. http://www. mantevo, org/downloads/HPCCG-1.0, html.
  • 7Hoefler T, Schneider T, i.urusdaine A. Characterizing the influence of system noise on large scale applications hy simulation [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Piseataway, NJ: IEEE, 2010:1-11.
  • 8Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, B: Instruction Set Reference, N-Z [M]. Santa Clara, California: Intel Corporation, 2010:251-252.
  • 9Dhabaleswar K. Osu micro-benchmarks [CP/OL]. [2014-03- 13]. http://mvapieh, cse. ohio-state, edu/benchmarks/.
  • 10Petrini F, Kerbyson D K, Pakin S. The case of the missing supercomputer performance: Achieving opllmal performance on the 8192 processors of ASCI Q [C] //Proc of the 2003 ACM/IEEE Con{ on Supereamputing. Piseataway, N J: IEEE, 2003, 55-55.

引证文献6

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部