期刊文献+

高性能计算集群运行时环境的配置优化 被引量:5

Runtime Environment Configuration and Optimization on High-performance Computing Clusters
原文传递
导出
摘要 本文关注如何在高性能计算集群上提供良好的运行时环境的问题,以使得并行应用程序获得更高的性能。指出了高性能计算集群运行过程中运行时环境配置优化需要考虑到的内容,包括跨节点资源的分配与选取、节点内进程及线程针对硬件资源的映射与绑定等两大类多方面因素,并分析了它们对并行应用程序性能带来的影响。通过在多个平台对基准程序和应用程序的实际测试来验证运行时环境对并行程序性能的影响,结果表明不同的运行时环境配置能够对应用程序造成约20%的性能差别。最后对运行时环境优化所需要进一步完成的各项具体工作进行了深入的讨论。 In this paper, it is discussed that how to provide good runtime environment for parallel applications so thus to obtain better performance on high-performance clusters. Some aspects within two categories are selected for analyzing their influence to parallel application performance, including inter-node resources allocation and selection and intra-node processes/threads mapping or binding with system resources. Several benchmarks and applications testing results on different platforms are given, which make it the evidence that runtime environment may cause -20% performance variance to parallel applications. Advanced discussion of the detail work need to be further performed is demonstrated at last.
作者 曹宗雁
出处 《科研信息化技术与应用》 2011年第6期52-61,共10页 E-science Technology & Application
基金 中国科学院"十一五"信息化专项"超级计算环境建设与应用"(INFO-115-B01) 国家863计划资助(2011AA01A205)
关键词 高性能计算 集群 运行时环境 性能优化 协同设计 High-performance computing Cluster Runtime environment Performance optimization Co-design
  • 相关文献

参考文献11

  • 1曹宗雁,牛铁,赵毅,朱鹏,迟学斌.基于通信优化的Infiniband集群MPI作业加载[J].计算机应用研究,2011,28(11):4256-4259. 被引量:1
  • 2Xuan-Yi Lin,Yeh-Ching Chung,Tai-Yi Huang.A Multiple LID Routing Scheme forFat-Tree-Based InfiniBand Networks[].Proceedings of IEEE International Parallel and Distributed Processing Symposiums.2004
  • 3David Culler,Richard Karp,David Patterson,et al.LogP: Towards a realistic model of parallel computation[].Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.1993
  • 4VISHNU A,KOOP M,MOODY A,et al.Topology agnostic hot-spotavoidance with Infiniband[].Concurrency and Computation:Practice and Experience.2008
  • 5KOOP M,LUO M,PANDA D K,et al.Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters[].IEEE International Conference on Cluster Computing and workshopsCLUSTER’’.2009
  • 6IBM System x3950 M2 and x3850 M2 Technical Introduction. http://www.redbooks.ibm.com/redpapers/pdfs/redp4362.pdf . 2008
  • 7.Intel MPI Benchmarks 3.2.2[]..2010
  • 8.NAS Parallel Benchmarks[]..2011
  • 9LONG W,WEILE J,XUEBIN C,et al.Large scale plane wave pseudopotential density functionaltheory calculations on GPU clusters[].International Conference for High Performance ComputingNetworkingStorage and AnalysisSC.2011
  • 10.Intel 5520/5500 Chipset:Datasheet[]..2009

二级参考文献12

  • 1Top 500 supercomputer sites [ EB/OL]. (2010-11-15) [2011-03- 30]. http ://www. top500, org/.
  • 2LIN Xuan-yi, CHUNG Y C, HUANG Tai-yi. A muhiple LID routing scheme for fat-tree-based Infiniband networks[ C]//Proc of the 18th International Parallel and Distributed Processing Symposium. Washing ton DC : IEEE Computer Society, 2004 :1-20.
  • 3VISHNU A, KOOP M, MOODY A, et al. Hot-spot avoidance with multi-pathing over Infiniband : an MPI perspective [ C ]// Proc of the 7th IEEE International Symposium on Cluster Computing and the Grid. Washington DC: IEEE Computer Society, 2007:479-486.
  • 4VISHNU A, KeeP M, MOODY A, et al. Topology agnostic hot-spot avoidance with Infiniband [ J ]. Concurrency and Computation: Practice and Experience, 2008, 21 (3): 301-319.
  • 5HOEELER T, GROPP W, THAKUR R, et al. Toward performance models of MPI implementations for understanding application scaling issues[ C]// Proc of the 17th European MPI Users' Group Meeting Conference on Recent Advances in the Message Passing Interface. [ S. l. ]: Springer-Verlag, 2010:21-30.
  • 6JEANNOT E, MERCIER G. Near-optimal placement of MPI processes on hierarchical NUMA architectures [ C ]//Proc of the 16th International Euro-Par Conference on Parallel Processing: Part II. [ S. l. ] : Springer-Verlag, 2010 : 199- 210.
  • 7HOEFLER T, RABENSEIFNER R, RITZDORF H, et al. The scalable process topology interface of MPI 2.2 [ J ]. Concurrency Computation: Practice and Experience, 2011,23 (4) : 293-310.
  • 8METROPOLIS N, ROSENBLUTH A W, ROSENBLUTH M N, et al. Equation of state calculations by fast computing machines [ J]. The Journal of Chemical Physics, 1953, 21 (6) : 1087-1091.
  • 9BROOKS S P, MORGAN B. Optimization using simulated annealing [J]. The Statistician,1995, 44(2): 241-257.
  • 10The OpenFabrics Alliance [ EB/OL]. (2009- 12- 23 ) [ 2010- 04- 16 ]. http ://www. openfabrics, org/.

同被引文献36

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2张浩,李利军,林岚.GPU的通用计算应用研究[J].计算机与数字工程,2005,33(12):60-62. 被引量:24
  • 3张庆丹,戴正华,冯圣中,孙凝晖.基于GPU的串匹配算法研究[J].计算机应用,2006,26(7):1735-1737. 被引量:15
  • 4李建明,万单领,迟忠先,胡祥培.一种基于GPU加速的细粒度并行粒子群算法[J].哈尔滨工业大学学报,2006,38(12):2162-2166. 被引量:8
  • 5Famhalian K,Houston M. A closer look at GPUs [J]. Commu- nications of the ACM, 2008, 51:50-57.
  • 6Murata T. Petri nets: properties, analysis and applications[J]. Proceedings of the IEEE, 1989,77(4):541-580.
  • 7Petrini F, Kerbyson D J, Pakin S. The case of the missing super- computer performance:achieving optimal performance on the 8, 192 processors of ASCI Q[C]//Proceedings of the 2003 ACM/ IEEE Conference on Supercomputing, 2003. Phoenix: ACM, 2003 : 55.
  • 8Hu L,Gorton I. Performance evaluation for parallel systems:A Survey[R]. Sydney: University of NSW, 1997.
  • 9Ciardo G, Cherkasova L, Kotov V, et al. Modeling a scalable high-speed interconnect with stochastic Petri nets[C]//Procee- dings of the Sixth International Workshop onPetri Nets and Per- formance Models, 1995. Durham: IEEE Computer Society Press, 1995:83-92.
  • 10Jain R. The art of computer systems performance analysis:tech- niques for experimental design, measurement, simulation, and modeling[M]. New York:John Wiley & Sons, 1991.

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部