Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simulta...Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.展开更多
Characteristic Basis Function Method (CBFM) is a novel approach for analyzing the ElectroMagnetic (EM) scattering from electrically large objects. Based on dividing the studied object into small blocks, the CBFM is su...Characteristic Basis Function Method (CBFM) is a novel approach for analyzing the ElectroMagnetic (EM) scattering from electrically large objects. Based on dividing the studied object into small blocks, the CBFM is suitable for parallel computing. In this paper, a static load balance parallel method is presented by combining Message Passing Interface (MPI) with Adaptively Modified CBFM (AMCBFM). In this method, the object geometry is partitioned into distinct blocks, and the serial number of blocks is sent to related nodes according to a certain rule. Every node only needs to calculate the information on local blocks. The obtained results confirm the accuracy and efficiency of the proposed method in speeding up solving large electrical scale problems.展开更多
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achie...The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).展开更多
In this paper, we present an acceleration strategy for Smoothed Particle Hydrodynamics (SPH) on multi-GPU platform. For single-GPU, we first use a neighborhood search algorithm of compacting cell index combined with...In this paper, we present an acceleration strategy for Smoothed Particle Hydrodynamics (SPH) on multi-GPU platform. For single-GPU, we first use a neighborhood search algorithm of compacting cell index combined with spatial domain characteristics For multi-GPU, we focus on the changing patterns of SPH's computational time. Simple dynamic load balancing algorithm works well because the computational time of each time step changes slowly compared to previous time step. By further optimizing dynamic load balancing algorithm and the communication strategy among GPUs, a nearly linear speedup is achieved in different scenarios with a scale of millions particles. The quality and efficiency of our methods are demonstrated using multiple scenes with different particle numbers.展开更多
基金Supported by Natural Science Foundation of China ( No. 60373061).
文摘Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.
文摘Characteristic Basis Function Method (CBFM) is a novel approach for analyzing the ElectroMagnetic (EM) scattering from electrically large objects. Based on dividing the studied object into small blocks, the CBFM is suitable for parallel computing. In this paper, a static load balance parallel method is presented by combining Message Passing Interface (MPI) with Adaptively Modified CBFM (AMCBFM). In this method, the object geometry is partitioned into distinct blocks, and the serial number of blocks is sent to related nodes according to a certain rule. Every node only needs to calculate the information on local blocks. The obtained results confirm the accuracy and efficiency of the proposed method in speeding up solving large electrical scale problems.
文摘The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
文摘In this paper, we present an acceleration strategy for Smoothed Particle Hydrodynamics (SPH) on multi-GPU platform. For single-GPU, we first use a neighborhood search algorithm of compacting cell index combined with spatial domain characteristics For multi-GPU, we focus on the changing patterns of SPH's computational time. Simple dynamic load balancing algorithm works well because the computational time of each time step changes slowly compared to previous time step. By further optimizing dynamic load balancing algorithm and the communication strategy among GPUs, a nearly linear speedup is achieved in different scenarios with a scale of millions particles. The quality and efficiency of our methods are demonstrated using multiple scenes with different particle numbers.