Using commodity SMPs (shared memory processors) to build cluster-based supercomputer has become a mainstream trend.Yet programming this kind of supercomputer system requires an environment support both message passing...Using commodity SMPs (shared memory processors) to build cluster-based supercomputer has become a mainstream trend.Yet programming this kind of supercomputer system requires an environment support both message passing and shared memory programming. This paper describes our preliminary work in an effort to target BSP library for cluster of SMPs. In order to exploit the maximum performance potential that a cluster of SMPs brings, we adopt thread technique to reduce system overhead and to exploit the capacity of SMPs. A fore-layer synchronization mechanism is proposed to support barrier synchronization within an SMP node, a group of SMP nodes and the whole cluster respectively. A comparison is made between our BSP library and the currently available BSP libraries such as PUB.展开更多
Shared Memory Processors (SMP) workstation clusters are becoming more and more popular. To optimize communication between the workstations, a new graph partition problem was developed to schedule tasks in SMP clusters...Shared Memory Processors (SMP) workstation clusters are becoming more and more popular. To optimize communication between the workstations, a new graph partition problem was developed to schedule tasks in SMP clusters. The problem is NP-complete and a heuristic algorithm was developed based on Lee, Kim and Park's algorithm. Experimental results indicate that our algorithm outperforms theirs, especially when the number of partitions is large. This algorithm can be integrated in a parallelizing compiler as a back end optimizer for the distributed code generator.展开更多
A hybrid decomposition method for molecular dynamics simulations was presented, using simul- taneously spatial decomposition and force decomposition to fit the architecture of a cluster of symmetric multi-processo...A hybrid decomposition method for molecular dynamics simulations was presented, using simul- taneously spatial decomposition and force decomposition to fit the architecture of a cluster of symmetric multi-processor (SMP) nodes. The method distributes particles between nodes based on the spatial decom- position strategy to reduce inter-node communication costs. The method also partitions particle pairs within each node using the force decomposition strategy to improve the load balance for each node. Simulation results for a nucleation process with 4 000 000 particles show that the hybrid method achieves better paral- lel performance than either spatial or force decomposition alone, especially when applied to a large scale particle system with non-uniform spatial density.展开更多
基金the National Natural Science Foundation of China(69603005), and the Science Foundation of Shanghai MunicipalCommission of Sc
文摘Using commodity SMPs (shared memory processors) to build cluster-based supercomputer has become a mainstream trend.Yet programming this kind of supercomputer system requires an environment support both message passing and shared memory programming. This paper describes our preliminary work in an effort to target BSP library for cluster of SMPs. In order to exploit the maximum performance potential that a cluster of SMPs brings, we adopt thread technique to reduce system overhead and to exploit the capacity of SMPs. A fore-layer synchronization mechanism is proposed to support barrier synchronization within an SMP node, a group of SMP nodes and the whole cluster respectively. A comparison is made between our BSP library and the currently available BSP libraries such as PUB.
文摘Shared Memory Processors (SMP) workstation clusters are becoming more and more popular. To optimize communication between the workstations, a new graph partition problem was developed to schedule tasks in SMP clusters. The problem is NP-complete and a heuristic algorithm was developed based on Lee, Kim and Park's algorithm. Experimental results indicate that our algorithm outperforms theirs, especially when the number of partitions is large. This algorithm can be integrated in a parallelizing compiler as a back end optimizer for the distributed code generator.
基金Supported by the "985" Basic Research Foundation of Tsinghua University of China (No. JC2001024)
文摘A hybrid decomposition method for molecular dynamics simulations was presented, using simul- taneously spatial decomposition and force decomposition to fit the architecture of a cluster of symmetric multi-processor (SMP) nodes. The method distributes particles between nodes based on the spatial decom- position strategy to reduce inter-node communication costs. The method also partitions particle pairs within each node using the force decomposition strategy to improve the load balance for each node. Simulation results for a nucleation process with 4 000 000 particles show that the hybrid method achieves better paral- lel performance than either spatial or force decomposition alone, especially when applied to a large scale particle system with non-uniform spatial density.