In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processin...In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processing compared with the powerful single workstation capacity, it is becoming severe important to keep balance not only for numerical load but also for communication load, and to overlap communications with computations while parallel computing. Hence,our efficiency evaluation rules must discover these capacities of a given parallel algorithm in order to optimize the existed algorithm to attain its highest parallel efficiency. The traditional efficiency evaluation rules can not succeed in this work any more. Fortunately, thanks to Culler's detail discuss in LogP model about interconnection networks for MPP systems, we present a system of efficiency evaluation rules for parallel computations under workstation cluster with PVM3.0 parallel software framework in this paper. These rules can satisfy above acquirements successfully. At last, two typical synchronous,and asynchronous applications are designed to verify the validity of these rules under 4 SGIs workstations cluster connected by Ethernet.展开更多
A new version of the Institute of Atmospheric Physics (IAP) 9-Layer (9L) atmospheric general circulation model (AGCM) suitable for Massively Parallel Processor (MPP) has been developed. This paper presents the princip...A new version of the Institute of Atmospheric Physics (IAP) 9-Layer (9L) atmospheric general circulation model (AGCM) suitable for Massively Parallel Processor (MPP) has been developed. This paper presents the principles of the parallel code design and examines its performance on a variety of state-of-the-art parallel computers in China. Domain decomposition strategy is used to achieve parallelism that is implemented by Message Passing Interface (MPI). Only the one dimensional domain decomposition algorithm is shown to scale favorably as the number of processors is increased.展开更多
文摘In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processing compared with the powerful single workstation capacity, it is becoming severe important to keep balance not only for numerical load but also for communication load, and to overlap communications with computations while parallel computing. Hence,our efficiency evaluation rules must discover these capacities of a given parallel algorithm in order to optimize the existed algorithm to attain its highest parallel efficiency. The traditional efficiency evaluation rules can not succeed in this work any more. Fortunately, thanks to Culler's detail discuss in LogP model about interconnection networks for MPP systems, we present a system of efficiency evaluation rules for parallel computations under workstation cluster with PVM3.0 parallel software framework in this paper. These rules can satisfy above acquirements successfully. At last, two typical synchronous,and asynchronous applications are designed to verify the validity of these rules under 4 SGIs workstations cluster connected by Ethernet.
基金the National Natural Science Foundation of China (Grant Nos.49775268 and 49823002) the China National Key Development Planni
文摘A new version of the Institute of Atmospheric Physics (IAP) 9-Layer (9L) atmospheric general circulation model (AGCM) suitable for Massively Parallel Processor (MPP) has been developed. This paper presents the principles of the parallel code design and examines its performance on a variety of state-of-the-art parallel computers in China. Domain decomposition strategy is used to achieve parallelism that is implemented by Message Passing Interface (MPI). Only the one dimensional domain decomposition algorithm is shown to scale favorably as the number of processors is increased.