In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of pa...In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.展开更多
A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk ac...A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.展开更多
The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research et...The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research etc. Many applications in these areas need super computing power. The traditional mode of sequential processing cannot meet the demands of those computations, thus, parallel processing(PP) is the main way of high performance computing (HPC) now.展开更多
The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic l...The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic load balancing is a technique for the parallel implementation of problems, which generate unpredictable workloads by migration work units from heavily loaded processor to lightly loaded processors at run time. This paper proposed an efficient load balancing method in which parallel tree computations depth first search (DFS) generates unpredictable, highly imbalance workloads and moves through different phases detectable at run time, where dynamic load balancing strategy is applicable in each phase running under the MPI(message passing interface) and Unix operating system on cluster of workstations parallel platform computing.展开更多
Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjuga...Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjugate gradient method based on domain decomposition. Using the developed code, a dam structural analysis problem is solved on workstation cluster and results are given. The parallel performance is analyzed.展开更多
In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processin...In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processing compared with the powerful single workstation capacity, it is becoming severe important to keep balance not only for numerical load but also for communication load, and to overlap communications with computations while parallel computing. Hence,our efficiency evaluation rules must discover these capacities of a given parallel algorithm in order to optimize the existed algorithm to attain its highest parallel efficiency. The traditional efficiency evaluation rules can not succeed in this work any more. Fortunately, thanks to Culler's detail discuss in LogP model about interconnection networks for MPP systems, we present a system of efficiency evaluation rules for parallel computations under workstation cluster with PVM3.0 parallel software framework in this paper. These rules can satisfy above acquirements successfully. At last, two typical synchronous,and asynchronous applications are designed to verify the validity of these rules under 4 SGIs workstations cluster connected by Ethernet.展开更多
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achie...The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).展开更多
A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to...A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to implement the dynamic domain decomposition.The domain decomposition approach perfectly matches the requirement of reducing the memory size per processor of the calculation.To treat the contact between boundary elements in neighbouring subdomains,the elements in a subdomain are classified into internal,interfacial and external elements.In this way,all the contact detect algorithms developed for a sequential computation could be adopted directly in the parallel computation.Numerical examples show that this implementation is suitable for simulating large-scale problems.Two typical numerical examples are given to demonstrate the parallel efficiency and scalability on a PC cluster.展开更多
The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that ...The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that developers can make ideal casting designs. To take the prior occupation at commercial simulation market, so many development groups in the world are doing their every effort. They already reported successful stories in manufacturing fields by developing and providing the high performance simulation technologies for multipurpose. But they all run at powerful desk-side computers by well-trained experts mainly, so that it is hard to diffuse the scientific designing concept to newcomers in casting field. To overcome upcoming problems in scientific casting designs, we utilized information technologies and full-matured hardware backbones to spread out the effective and scientific casting design mind, and they all were integrated into Simulation Portal on the web. It professes scientific casting design on the NET including ubiquitous access way represented by "Anyone, Anytime, Anywhere" concept for casting designs.展开更多
This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distribut...This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.展开更多
The investigation is generalized to clusters with sizes up to 3000 atoms, covering this way the range of sizes experimentally available for low energy cluster beam deposition. The atomic scale modeling is carried on b...The investigation is generalized to clusters with sizes up to 3000 atoms, covering this way the range of sizes experimentally available for low energy cluster beam deposition. The atomic scale modeling is carried on by both Molecular Dynamics and Metropolis Monte Carlo. This represents a huge series of simulations (175 cases) to which further calculations are added by spot when finer tuning of the parameters is necessary. Analyzing the results is a major task which is still in progress. This way, not only a realistic range of sizes is covered, but also the whole range of compositions and the temperature range relevant to the solid and the liquid states.展开更多
This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer ...This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalabilitv in a multi-thread parallel computation on a CPU.展开更多
Using commodity SMPs (shared memory processors) to build cluster-based supercomputer has become a mainstream trend.Yet programming this kind of supercomputer system requires an environment support both message passing...Using commodity SMPs (shared memory processors) to build cluster-based supercomputer has become a mainstream trend.Yet programming this kind of supercomputer system requires an environment support both message passing and shared memory programming. This paper describes our preliminary work in an effort to target BSP library for cluster of SMPs. In order to exploit the maximum performance potential that a cluster of SMPs brings, we adopt thread technique to reduce system overhead and to exploit the capacity of SMPs. A fore-layer synchronization mechanism is proposed to support barrier synchronization within an SMP node, a group of SMP nodes and the whole cluster respectively. A comparison is made between our BSP library and the currently available BSP libraries such as PUB.展开更多
基金Supported by the China Postdoctoral Science Foundation(No.2014M552115)the Fundamental Research Funds for the Central Universities,ChinaUniversity of Geosciences(Wuhan)(No.CUGL140833)the National Key Technology Support Program of China(No.2011BAH06B04)
文摘In order to improve the concurrent access performance of the web-based spatial computing system in cluster,a parallel scheduling strategy based on the multi-core environment is proposed,which includes two levels of parallel processing mechanisms.One is that it can evenly allocate tasks to each server node in the cluster and the other is that it can implement the load balancing inside a server node.Based on the strategy,a new web-based spatial computing model is designed in this paper,in which,a task response ratio calculation method,a request queue buffer mechanism and a thread scheduling strategy are focused on.Experimental results show that the new model can fully use the multi-core computing advantage of each server node in the concurrent access environment and improve the average hits per second,average I/O Hits,CPU utilization and throughput.Using speed-up ratio to analyze the traditional model and the new one,the result shows that the new model has the best performance.The performance of the multi-core server nodes in the cluster is optimized;the resource utilization and the parallel processing capabilities are enhanced.The more CPU cores you have,the higher parallel processing capabilities will be obtained.
文摘A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.
文摘The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research etc. Many applications in these areas need super computing power. The traditional mode of sequential processing cannot meet the demands of those computations, thus, parallel processing(PP) is the main way of high performance computing (HPC) now.
基金Natural Science Foundation of China (No.60 173 0 3 1)
文摘The real problem in cluster of workstations is the changes in workstation power or number of workstations or dynmaic changes in the run time behavior of the application hamper the efficient use of resources. Dynamic load balancing is a technique for the parallel implementation of problems, which generate unpredictable workloads by migration work units from heavily loaded processor to lightly loaded processors at run time. This paper proposed an efficient load balancing method in which parallel tree computations depth first search (DFS) generates unpredictable, highly imbalance workloads and moves through different phases detectable at run time, where dynamic load balancing strategy is applicable in each phase running under the MPI(message passing interface) and Unix operating system on cluster of workstations parallel platform computing.
基金Project supported by Key Project Science Foundation of ShanghaiMunicipal Commission of Education (Grant No .03AZ03)
文摘Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjugate gradient method based on domain decomposition. Using the developed code, a dam structural analysis problem is solved on workstation cluster and results are given. The parallel performance is analyzed.
文摘In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processing compared with the powerful single workstation capacity, it is becoming severe important to keep balance not only for numerical load but also for communication load, and to overlap communications with computations while parallel computing. Hence,our efficiency evaluation rules must discover these capacities of a given parallel algorithm in order to optimize the existed algorithm to attain its highest parallel efficiency. The traditional efficiency evaluation rules can not succeed in this work any more. Fortunately, thanks to Culler's detail discuss in LogP model about interconnection networks for MPP systems, we present a system of efficiency evaluation rules for parallel computations under workstation cluster with PVM3.0 parallel software framework in this paper. These rules can satisfy above acquirements successfully. At last, two typical synchronous,and asynchronous applications are designed to verify the validity of these rules under 4 SGIs workstations cluster connected by Ethernet.
文摘The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm, asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a load-balancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree’s shape. For the (ARR) algorithm only minimal interprocessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
基金The project supported by the National Natural Science Foundation of China (10372114) and the Engineering and Physical Sciences Research Council (EPSRC) of UK (GR/R21219)
文摘A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to implement the dynamic domain decomposition.The domain decomposition approach perfectly matches the requirement of reducing the memory size per processor of the calculation.To treat the contact between boundary elements in neighbouring subdomains,the elements in a subdomain are classified into internal,interfacial and external elements.In this way,all the contact detect algorithms developed for a sequential computation could be adopted directly in the parallel computation.Numerical examples show that this implementation is suitable for simulating large-scale problems.Two typical numerical examples are given to demonstrate the parallel efficiency and scalability on a PC cluster.
文摘The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that developers can make ideal casting designs. To take the prior occupation at commercial simulation market, so many development groups in the world are doing their every effort. They already reported successful stories in manufacturing fields by developing and providing the high performance simulation technologies for multipurpose. But they all run at powerful desk-side computers by well-trained experts mainly, so that it is hard to diffuse the scientific designing concept to newcomers in casting field. To overcome upcoming problems in scientific casting designs, we utilized information technologies and full-matured hardware backbones to spread out the effective and scientific casting design mind, and they all were integrated into Simulation Portal on the web. It professes scientific casting design on the NET including ubiquitous access way represented by "Anyone, Anytime, Anywhere" concept for casting designs.
基金National Science Foundation of China(No.60 173 0 3 1)
文摘This paper presented an idea to replace the traditionally expensive parallel machines by heterogeneous cluster of workstations. To emphasise the usability of cluster of workstations platform for parallel and distributed computing, also the paper presented the status report on the effort and experiences for the implementation of a dynamic load balancing for parallel tree computation depth first search(DFS) on the cluster of a workstations project. It compared the speedup performance obtained from our platform with that obtained from the traditional one. The speedup results show that cluster of workstations can be a serious alternative to the expensive parallel machines.
基金Acknowledgment: This work is supported by Fujian Province Natural Science Foundation (No. 2008J0180) and Scientific Research Start Foundation of Fujian University of Technology (No. GY-Z0707).
文摘The investigation is generalized to clusters with sizes up to 3000 atoms, covering this way the range of sizes experimentally available for low energy cluster beam deposition. The atomic scale modeling is carried on by both Molecular Dynamics and Metropolis Monte Carlo. This represents a huge series of simulations (175 cases) to which further calculations are added by spot when finer tuning of the parameters is necessary. Analyzing the results is a major task which is still in progress. This way, not only a realistic range of sizes is covered, but also the whole range of compositions and the temperature range relevant to the solid and the liquid states.
文摘This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalabilitv in a multi-thread parallel computation on a CPU.
基金the National Natural Science Foundation of China(69603005), and the Science Foundation of Shanghai MunicipalCommission of Sc
文摘Using commodity SMPs (shared memory processors) to build cluster-based supercomputer has become a mainstream trend.Yet programming this kind of supercomputer system requires an environment support both message passing and shared memory programming. This paper describes our preliminary work in an effort to target BSP library for cluster of SMPs. In order to exploit the maximum performance potential that a cluster of SMPs brings, we adopt thread technique to reduce system overhead and to exploit the capacity of SMPs. A fore-layer synchronization mechanism is proposed to support barrier synchronization within an SMP node, a group of SMP nodes and the whole cluster respectively. A comparison is made between our BSP library and the currently available BSP libraries such as PUB.