期刊文献+

ArchSim:A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer 被引量:4

ArchSim:A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer
原文传递
导出
摘要 High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems. High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2009年第5期901-912,共12页 计算机科学技术学报(英文版)
基金 supported by the National High Technology Research and Development 863 Program of China under Grant No. 2007AA01Z117 the National Basic Research 973 Program of China under Grant No.2007CB310900
关键词 high performance computer architecture system-level parallel simulation synchronization protocol message communication distributed checkpointing/restart high performance computer architecture, system-level parallel simulation, synchronization protocol, message communication, distributed checkpointing/restart
  • 相关文献

参考文献18

  • 1Zheng G, Kakulapati G, Kale L V. BigSim: A parallel simulator for performance prediction of extremely large parallel machines. In Proe. the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA, April 26-30, 2004, p.78.
  • 2Saboo N, Singla A K, Unger J M, and Kale L V. Emulating petaflops machines and Blue Gene. In Proc. the 15th International Parallel and Distributed Processing Symposium, San Francisco, USA, April 23-27, 2001, pp.2048-2091.
  • 3Caudell T P, Summers K L, Zhou C. a la carte -- A Los Alamos computer architecture toolkit for extreme-scale architecture simulation, 2003, http://wwwc3.1anl.gov/parsim.
  • 4Moss N. PARSIM: Parallel architecture simulation tool. In Proc. Los Alamos National Laboratory Student Symposium. Aug. 2002.
  • 5Springer P L, Brodowicz M, Brunett Set al. Performance analysis of blue Gene/L using parallel discrete event simulation. Technical Report, California Institute of Technology, 2004.
  • 6Ceze L, Strauss K, Almasi G et al. Full circle: Simulating Linux clusters on Linux clusters. In Proc. the Fourth LCI International Conference on Linux Clusters: the HPC Revolution P003, San Jose USA, June 24-26, 2003.
  • 7Fujimoto R M, Das S R, Panesar K S. Georgia Tech Time Warp (GTW Version 2.3) programmer's manual. 1994, htt p://www.cc.gatech.edu/computing/pads/PAPERS/gtw.ps.
  • 8Steinman J S. SPEEDES: Synchronous parallel environment for emulation and discrete-event simulation. Advance in Parallel and Distributed Simulation, SCS Simulation Series, January, 1991, 23(1): 95-103.
  • 9Rao D M, Wilsey P A. An ultra-large-scale simulation framework. Journal of Parallel and Distributed Computing, 2002, 62(11): 1670-1693.
  • 10Wilmarth T L. POSE: Scalable general-purpose parallel discrete event simulation. Technical Report, Department of Computer Science, University of Illinois at Urbana- Champaign, 2005.

同被引文献46

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部