Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simulta...Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.展开更多
Load balancing is an important stage of a system using parallel computing where the aim is the balance of workload among all processors of the system. In this paper, we introduce a new load balancing algorithm with ne...Load balancing is an important stage of a system using parallel computing where the aim is the balance of workload among all processors of the system. In this paper, we introduce a new load balancing algorithm with new capabilities for parallel systems, among which is the independence of a separate route-finder algorithm between the load receiver and sender nodes. In addition to simulation of the new algorithm, due to similarity in behavior to the proposed algorithm, the central algorithm is simulated. Simulation results show that, the system performance increases with the increase of the degree of neighborhood between the processors. These results also indicate the algorithm’s high compatibility with environment changes.展开更多
The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic l...The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic load balancing is a technique for the parallel implementation of problems, which generate unpredictable workloads by migration work units from heavily loaded processor to lightly loaded processors at run time. This paper proposed an efficient load balancing method in which parallel tree computations depth first search (DFS) generates unpredictable, highly imbalance workloads and moves through different phases detectable at run time, where dynamic load balancing strategy is applicable in each phase running under the MPI(message passing interface) and Unix operating system on cluster of workstations parallel platform computing.展开更多
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achie...The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).展开更多
Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we pro...Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.展开更多
This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distribut...This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.展开更多
Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributed computing system. In this paper, we describe the design...Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributed computing system. In this paper, we describe the design and implementation of an online thread scheduling and migration system (S&M) based on a previous work of LWP -MPI. Experimental results show that performance is enhanced.展开更多
To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel c...To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing.展开更多
In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa...In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.展开更多
Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest het...Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest heterogeneous scheduling algorithm is presented. It fully considers the system characteristics (from application view), structure and state. So it always can utilize all processing resource under a reasonable premise. The results of experiment show the algorithm can significantly shorten the response time of jobs.展开更多
In this paper, we propose a decentralized parallel computation model for global optimization using interval analysis. The model is adaptive to any number of processors and the workload is automatically and evenly dist...In this paper, we propose a decentralized parallel computation model for global optimization using interval analysis. The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing. The problems received by each processor are processed based on their local dominance properties, which avoids unnecessary interval evaluations. Further, the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required. Numerical experiments indicate that the model works well and is stable with different number of parallel processors, distributes the load evenly among the processors, and provides an impressive speedup, especially when the problem is time-consuming to solve.展开更多
The Global/Regional Assimilation and PrEdiction System(GRAPES)is a new-generation operational numerical weather prediction(NWP)model developed by the China Meteorological Administration(CMA).It is a grid-point m...The Global/Regional Assimilation and PrEdiction System(GRAPES)is a new-generation operational numerical weather prediction(NWP)model developed by the China Meteorological Administration(CMA).It is a grid-point model with a code structure different from that of spectral models used in other operational NWP centers such as the European Centre for Medium-Range Weather Forecasts(ECMWF),National Centers for Environmental Prediction(NCEP),and Japan Meteorological Agency(JMA),especially in the context of parallel computing.In the GRAPES global model,a semi-implicit semi-Lagrangian scheme is used for the discretization over a sphere,which requires careful planning for the busy communications between the arrays of processors,because the Lagrangian differential scheme results in shortened trajectories interpolated between the grid points at the poles and in the associated adjacent areas.This means that the latitude-longitude partitioning is more complex for the polar processors.Therefore,a parallel strategy with efficient computation,balanced load,and synchronous communication shall be developed.In this paper,a message passing approach based on MPI(Message Passing Interface)group communication is proposed.Its key-point is to group the polar processors in row with matrix-topology during the processor partitioning.A load balance task distribution algorithm is also discussed.Test runs on the IBM-cluster 1600 at CMA show that the new algorithm is of desired scalability,and the readjusted load balance scheme can reduce the absolute wall clock time by 10% or more.The quasi-operational runs of the model demonstrate that the wall clock time secured by the strategy meets the real-time needs of NWP operations.展开更多
基金Supported by Natural Science Foundation of China ( No. 60373061).
文摘Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.
文摘Load balancing is an important stage of a system using parallel computing where the aim is the balance of workload among all processors of the system. In this paper, we introduce a new load balancing algorithm with new capabilities for parallel systems, among which is the independence of a separate route-finder algorithm between the load receiver and sender nodes. In addition to simulation of the new algorithm, due to similarity in behavior to the proposed algorithm, the central algorithm is simulated. Simulation results show that, the system performance increases with the increase of the degree of neighborhood between the processors. These results also indicate the algorithm’s high compatibility with environment changes.
基金Natural Science Foundation of China (No.60 173 0 3 1)
文摘The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic load balancing is a technique for the parallel implementation of problems, which generate unpredictable workloads by migration work units from heavily loaded processor to lightly loaded processors at run time. This paper proposed an efficient load balancing method in which parallel tree computations depth first search (DFS) generates unpredictable, highly imbalance workloads and moves through different phases detectable at run time, where dynamic load balancing strategy is applicable in each phase running under the MPI(message passing interface) and Unix operating system on cluster of workstations parallel platform computing.
文摘The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
基金Project supported by the National Natural Science Foundation of China (Grant Nos.61303071 and 61120106005)the Natural Science Fund from the Guangzhou Science and Information Technology Bureau (Grant No.134200026)
文摘Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.
基金National Science Foundation of China(No.60 173 0 3 1)
文摘This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.
文摘Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributed computing system. In this paper, we describe the design and implementation of an online thread scheduling and migration system (S&M) based on a previous work of LWP -MPI. Experimental results show that performance is enhanced.
基金supported by Postdoctoral Science Foundation of China(No.2021M702441)National Natural Science Foundation of China(No.61871283)。
文摘To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing.
基金Supported by the China Postdoctoral Science Foundation(No.2014M552115)the Fundamental Research Funds for the Central Universities,ChinaUniversity of Geosciences(Wuhan)(No.CUGL140833)the National Key Technology Support Program of China(No.2011BAH06B04)
文摘In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.
文摘Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest heterogeneous scheduling algorithm is presented. It fully considers the system characteristics (from application view), structure and state. So it always can utilize all processing resource under a reasonable premise. The results of experiment show the algorithm can significantly shorten the response time of jobs.
文摘In this paper, we propose a decentralized parallel computation model for global optimization using interval analysis. The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing. The problems received by each processor are processed based on their local dominance properties, which avoids unnecessary interval evaluations. Further, the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required. Numerical experiments indicate that the model works well and is stable with different number of parallel processors, distributes the load evenly among the processors, and provides an impressive speedup, especially when the problem is time-consuming to solve.
基金Supported by the National S&T Infrastructure Program for the 11th Five-Year Period under Grant No.2006BAC02B00the National Natural Science Foundation of China under Grant Nos.40575050 and 40775073
文摘The Global/Regional Assimilation and PrEdiction System(GRAPES)is a new-generation operational numerical weather prediction(NWP)model developed by the China Meteorological Administration(CMA).It is a grid-point model with a code structure different from that of spectral models used in other operational NWP centers such as the European Centre for Medium-Range Weather Forecasts(ECMWF),National Centers for Environmental Prediction(NCEP),and Japan Meteorological Agency(JMA),especially in the context of parallel computing.In the GRAPES global model,a semi-implicit semi-Lagrangian scheme is used for the discretization over a sphere,which requires careful planning for the busy communications between the arrays of processors,because the Lagrangian differential scheme results in shortened trajectories interpolated between the grid points at the poles and in the associated adjacent areas.This means that the latitude-longitude partitioning is more complex for the polar processors.Therefore,a parallel strategy with efficient computation,balanced load,and synchronous communication shall be developed.In this paper,a message passing approach based on MPI(Message Passing Interface)group communication is proposed.Its key-point is to group the polar processors in row with matrix-topology during the processor partitioning.A load balance task distribution algorithm is also discussed.Test runs on the IBM-cluster 1600 at CMA show that the new algorithm is of desired scalability,and the readjusted load balance scheme can reduce the absolute wall clock time by 10% or more.The quasi-operational runs of the model demonstrate that the wall clock time secured by the strategy meets the real-time needs of NWP operations.