Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we pro...Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.展开更多
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achie...The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).展开更多
This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distribut...This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.展开更多
Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest het...Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest heterogeneous scheduling algorithm is presented. It fully considers the system characteristics (from application view), structure and state. So it always can utilize all processing resource under a reasonable premise. The results of experiment show the algorithm can significantly shorten the response time of jobs.展开更多
Load balancing is an important stage of a system using parallel computing where the aim is the balance of workload among all processors of the system. In this paper, we introduce a new load balancing algorithm with ne...Load balancing is an important stage of a system using parallel computing where the aim is the balance of workload among all processors of the system. In this paper, we introduce a new load balancing algorithm with new capabilities for parallel systems, among which is the independence of a separate route-finder algorithm between the load receiver and sender nodes. In addition to simulation of the new algorithm, due to similarity in behavior to the proposed algorithm, the central algorithm is simulated. Simulation results show that, the system performance increases with the increase of the degree of neighborhood between the processors. These results also indicate the algorithm’s high compatibility with environment changes.展开更多
It is desirable in a distributed system to have the system load balanced evenly among the nodes so that the mean job response time is minimized. In this paper, we present.a dynamic load balancing mechanism (DLB). It a...It is desirable in a distributed system to have the system load balanced evenly among the nodes so that the mean job response time is minimized. In this paper, we present.a dynamic load balancing mechanism (DLB). It adopts a centralized approach and is network topology independent. The DLB mechanism employs a set of thresholds which are automatically adjusted as the system load changes. lt also provides a simple mechanism for the system to switch between periodic and instantaneous load balancing policies with ease. The performance of the proposed algorithm is evaluated by intensive simulations for various parameters. The simulAtion results show that the mean job response time in a system implementing DLB algorithm is significantly lower than the same system without load balancings. Furthermore, compared with a previously proposed algorithm, DLB algorithm demonstrates improved performance, especially when the system is heavily loaded and the load is unevenly distributed.展开更多
Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simulta...Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.展开更多
Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributed computing system. In this paper, we describe the design...Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributed computing system. In this paper, we describe the design and implementation of an online thread scheduling and migration system (S&M) based on a previous work of LWP -MPI. Experimental results show that performance is enhanced.展开更多
In this paper, we propose a decentralized parallel computation model for global optimization using interval analysis. The model is adaptive to any number of processors and the workload is automatically and evenly dist...In this paper, we propose a decentralized parallel computation model for global optimization using interval analysis. The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing. The problems received by each processor are processed based on their local dominance properties, which avoids unnecessary interval evaluations. Further, the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required. Numerical experiments indicate that the model works well and is stable with different number of parallel processors, distributes the load evenly among the processors, and provides an impressive speedup, especially when the problem is time-consuming to solve.展开更多
Many latest high performance distributed computational environments come with high bandwidth in commu- nication. Such high bandwidth distributed systems provide unprecedented opportunities for analyzing huge datasets,...Many latest high performance distributed computational environments come with high bandwidth in commu- nication. Such high bandwidth distributed systems provide unprecedented opportunities for analyzing huge datasets, but simultaneously posts new technical challenges. For users, progressive query answering is important. For utility of systems, load balancing is critical. How we can achieve progressive and load balancing distributed computation is an interesting and promising research direction. As skyline analysis has been shown very useful in many multi-criteria decision making applications, in this paper, we study the problem of progressive and load balancing distributed skyline analysis. We propose a simple yet scalable approach which comes with several nice properties for progressive and load balancing query answering. We conduct extensive experiments which demonstrate the feasibility and effectiveness of the proposed method.展开更多
The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic l...The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic load balancing is a technique for the parallel implementation of problems, which generate unpredictable workloads by migration work units from heavily loaded processor to lightly loaded processors at run time. This paper proposed an efficient load balancing method in which parallel tree computations depth first search (DFS) generates unpredictable, highly imbalance workloads and moves through different phases detectable at run time, where dynamic load balancing strategy is applicable in each phase running under the MPI(message passing interface) and Unix operating system on cluster of workstations parallel platform computing.展开更多
To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel c...To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing.展开更多
In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa...In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.展开更多
Task scheduling plays a key role in effectively managing and allocating computing resources to meet various computing tasks in a cloud computing environment.Short execution time and low load imbalance may be the chall...Task scheduling plays a key role in effectively managing and allocating computing resources to meet various computing tasks in a cloud computing environment.Short execution time and low load imbalance may be the challenges for some algorithms in resource scheduling scenarios.In this work,the Hierarchical Particle Swarm Optimization-Evolutionary Artificial Bee Colony Algorithm(HPSO-EABC)has been proposed,which hybrids our presented Evolutionary Artificial Bee Colony(EABC),and Hierarchical Particle Swarm Optimization(HPSO)algorithm.The HPSO-EABC algorithm incorporates both the advantages of the HPSO and the EABC algorithm.Comprehensive testing including evaluations of algorithm convergence speed,resource execution time,load balancing,and operational costs has been done.The results indicate that the EABC algorithm exhibits greater parallelism compared to the Artificial Bee Colony algorithm.Compared with the Particle Swarm Optimization algorithm,the HPSO algorithmnot only improves the global search capability but also effectively mitigates getting stuck in local optima.As a result,the hybrid HPSO-EABC algorithm demonstrates significant improvements in terms of stability and convergence speed.Moreover,it exhibits enhanced resource scheduling performance in both homogeneous and heterogeneous environments,effectively reducing execution time and cost,which also is verified by the ablation experimental.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant Nos.61303071 and 61120106005)the Natural Science Fund from the Guangzhou Science and Information Technology Bureau (Grant No.134200026)
文摘Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.
文摘The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
基金National Science Foundation of China(No.60 173 0 3 1)
文摘This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.
文摘Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest heterogeneous scheduling algorithm is presented. It fully considers the system characteristics (from application view), structure and state. So it always can utilize all processing resource under a reasonable premise. The results of experiment show the algorithm can significantly shorten the response time of jobs.
文摘Load balancing is an important stage of a system using parallel computing where the aim is the balance of workload among all processors of the system. In this paper, we introduce a new load balancing algorithm with new capabilities for parallel systems, among which is the independence of a separate route-finder algorithm between the load receiver and sender nodes. In addition to simulation of the new algorithm, due to similarity in behavior to the proposed algorithm, the central algorithm is simulated. Simulation results show that, the system performance increases with the increase of the degree of neighborhood between the processors. These results also indicate the algorithm’s high compatibility with environment changes.
文摘It is desirable in a distributed system to have the system load balanced evenly among the nodes so that the mean job response time is minimized. In this paper, we present.a dynamic load balancing mechanism (DLB). It adopts a centralized approach and is network topology independent. The DLB mechanism employs a set of thresholds which are automatically adjusted as the system load changes. lt also provides a simple mechanism for the system to switch between periodic and instantaneous load balancing policies with ease. The performance of the proposed algorithm is evaluated by intensive simulations for various parameters. The simulAtion results show that the mean job response time in a system implementing DLB algorithm is significantly lower than the same system without load balancings. Furthermore, compared with a previously proposed algorithm, DLB algorithm demonstrates improved performance, especially when the system is heavily loaded and the load is unevenly distributed.
基金Supported by Natural Science Foundation of China ( No. 60373061).
文摘Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.
文摘Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributed computing system. In this paper, we describe the design and implementation of an online thread scheduling and migration system (S&M) based on a previous work of LWP -MPI. Experimental results show that performance is enhanced.
文摘In this paper, we propose a decentralized parallel computation model for global optimization using interval analysis. The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing. The problems received by each processor are processed based on their local dominance properties, which avoids unnecessary interval evaluations. Further, the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required. Numerical experiments indicate that the model works well and is stable with different number of parallel processors, distributes the load evenly among the processors, and provides an impressive speedup, especially when the problem is time-consuming to solve.
基金Supported by the Doctoral Research Foundation of the Natural Science Foundation of Guangdong Province under Grant No.8451064101000054the National Natural Science Foundation of China under Grant Nos. 60773198,60703111+3 种基金Natural Science Foundation of Guangdong Province under Grant Nos. 06104916,8151027501000021Research Foundation of Science and Technology PlanProject in Guangdong Province under Grant No. 2008B050100040Program for New Century Excellent Talents in University ofChina under Grant No. NCET-06-0727the Fundamental Research Funds for the Central Universities,SCUT,under Grant No.2009ZM0008
文摘Many latest high performance distributed computational environments come with high bandwidth in commu- nication. Such high bandwidth distributed systems provide unprecedented opportunities for analyzing huge datasets, but simultaneously posts new technical challenges. For users, progressive query answering is important. For utility of systems, load balancing is critical. How we can achieve progressive and load balancing distributed computation is an interesting and promising research direction. As skyline analysis has been shown very useful in many multi-criteria decision making applications, in this paper, we study the problem of progressive and load balancing distributed skyline analysis. We propose a simple yet scalable approach which comes with several nice properties for progressive and load balancing query answering. We conduct extensive experiments which demonstrate the feasibility and effectiveness of the proposed method.
基金Natural Science Foundation of China (No.60 173 0 3 1)
文摘The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic load balancing is a technique for the parallel implementation of problems, which generate unpredictable workloads by migration work units from heavily loaded processor to lightly loaded processors at run time. This paper proposed an efficient load balancing method in which parallel tree computations depth first search (DFS) generates unpredictable, highly imbalance workloads and moves through different phases detectable at run time, where dynamic load balancing strategy is applicable in each phase running under the MPI(message passing interface) and Unix operating system on cluster of workstations parallel platform computing.
基金supported by Postdoctoral Science Foundation of China(No.2021M702441)National Natural Science Foundation of China(No.61871283)。
文摘To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing.
基金Supported by the China Postdoctoral Science Foundation(No.2014M552115)the Fundamental Research Funds for the Central Universities,ChinaUniversity of Geosciences(Wuhan)(No.CUGL140833)the National Key Technology Support Program of China(No.2011BAH06B04)
文摘In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.
基金jointly supported by the Jiangsu Postgraduate Research and Practice Innovation Project under Grant KYCX22_1030,SJCX22_0283 and SJCX23_0293the NUPTSF under Grant NY220201.
文摘Task scheduling plays a key role in effectively managing and allocating computing resources to meet various computing tasks in a cloud computing environment.Short execution time and low load imbalance may be the challenges for some algorithms in resource scheduling scenarios.In this work,the Hierarchical Particle Swarm Optimization-Evolutionary Artificial Bee Colony Algorithm(HPSO-EABC)has been proposed,which hybrids our presented Evolutionary Artificial Bee Colony(EABC),and Hierarchical Particle Swarm Optimization(HPSO)algorithm.The HPSO-EABC algorithm incorporates both the advantages of the HPSO and the EABC algorithm.Comprehensive testing including evaluations of algorithm convergence speed,resource execution time,load balancing,and operational costs has been done.The results indicate that the EABC algorithm exhibits greater parallelism compared to the Artificial Bee Colony algorithm.Compared with the Particle Swarm Optimization algorithm,the HPSO algorithmnot only improves the global search capability but also effectively mitigates getting stuck in local optima.As a result,the hybrid HPSO-EABC algorithm demonstrates significant improvements in terms of stability and convergence speed.Moreover,it exhibits enhanced resource scheduling performance in both homogeneous and heterogeneous environments,effectively reducing execution time and cost,which also is verified by the ablation experimental.