期刊文献+

MPI并行程序中通信等待问题的诊断方法及其应用 被引量:1

Diagnostic methods for communication waiting in MPI parallel programs and applications
下载PDF
导出
摘要 随着并行规模的扩大,现有通信等待问题的诊断方法存在内存开销大、测量时间开销大等问题。通过对现有通信等待问题诊断方法的深入分析,同时考虑测量开销可控的实际需求,建立基于热点函数的通信等待问题诊断模型。基于上述模型,总结出一种更精简、更实用的通信等待问题诊断方法。将该诊断方法分别应用到二维LARED集成、LARED-S、LAP3D等大规模MPI并行程序的通信等待问题诊断过程,应用效果表明本诊断方法可精确定位导致通信等待问题的关键代码段,给出的优化方案及性能提升空间对于后续的程序改进具有参考价值,其中根据诊断结果优化后的LARED-S程序性能提升32%,通信等待时间减少44%。 As the increasing of the scale of parallel systems,some problems such as large measurement cost and memory overhead exist in the diagnostic methods of communication waiting phenomenon.With the deep analysis on the existing diagnostic methods,and considering the actual demand of controllable measurement,a diagnosis model for communication waiting based on hotspot function was established,and a tidy and practical diagnostic method based on the above model was presented.The above diagnostic method was applied to the diagnostic process of the communication waiting phenomenon in the large-scale MPI parallel programs,such as the LARED integration,the LARED-S,the LAP3D.The application results show that this method can accurately identify the key code segment leading to communication waiting and the proposed optimization solution and performance improvement space has reference value for the subsequent program improvement.The optimized LARED-S program,according to the diagnostic result,can increase performance by 32%and reduce communication waiting time by 44%.
作者 武林平 景翠萍 刘旭 田鸿运 WU Linping;JING Cuiping;LIU Xu;TIAN Hongyun(Institute of Applied Physics and Computational Mathematics,Beijing 100094,China)
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2020年第2期47-54,共8页 Journal of National University of Defense Technology
基金 国家重点研发计划资助项目(2018YFB0204003) 国家自然科学基金资助项目(61672003) 国家自然科学基金青年科学基金资助项目(11601034)。
关键词 通信等待 MPI并行程序 负载平衡 性能诊断 communication waiting MPI parallel programs load balance performance diagnosis
  • 相关文献

参考文献1

二级参考文献11

  • 1武林平,魏勇,刘旭.多核集群中系统嗓音的测最[C]//2012全国高性能计算学术年会.北京:中国计算机学会,2012:1-5.
  • 2Gioiosa R, Petrini F, Davis K, et al. Analysis of system overhead on parallel computers [C] //Proe of the 4th IEEE Int Symp on Signal Processing and Information Technology. Piscataway, NJ: IEEE, 2004:387-390.
  • 3Beckman P, lskra K, Yoshii K, et al. Benchmarking the effects of operating system interference on extreme-scale parallel machines [J]. Cluster Computing, 2008, 11 (1) : 3- 16.
  • 4Herowx M A. HPCCG; A simple conjugate gradient benchmark code for a 3D chimney domain on an arbitrary numher of processors [CP/OL]. [2014-03-13]. http://www. mantevo, org/downloads/HPCCG-1.0, html.
  • 5Hoefler T, Schneider T, i.urusdaine A. Characterizing the influence of system noise on large scale applications hy simulation [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Piseataway, NJ: IEEE, 2010:1-11.
  • 6Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, B: Instruction Set Reference, N-Z [M]. Santa Clara, California: Intel Corporation, 2010:251-252.
  • 7Dhabaleswar K. Osu micro-benchmarks [CP/OL]. [2014-03- 13]. http://mvapieh, cse. ohio-state, edu/benchmarks/.
  • 8Petrini F, Kerbyson D K, Pakin S. The case of the missing supercomputer performance: Achieving opllmal performance on the 8192 processors of ASCI Q [C] //Proc of the 2003 ACM/IEEE Con{ on Supereamputing. Piseataway, N J: IEEE, 2003, 55-55.
  • 9Ferreira K B, Bridges P, Brightwell R. Characterizing application sensitivity to OS interference using kernel level noise injection [C] //Proc of the 2008 ACM/IEEE Conf on Supereomputing. Piscataway, NJ: IEEE, 2008: 19-29.
  • 10Dean J, Barroso L A. The tail at scale [J]. Communications of the ACM, 2013, 56(2): 74-80.

共引文献3

同被引文献23

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部