This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison ...This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.展开更多
Local mesh refinement is one of the key steps in the implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement...Local mesh refinement is one of the key steps in the implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement of tetrahedral meshes using bisection. This algorithm is used in PHG, Parallel Hierarchical Grid Chttp://lsec. cc. ac. cn/phg/), a toolbox under active development for parallel adaptive finite element solutions of partial differential equations. The algorithm proposed is characterized by allowing simukaneous refinement of submeshes to arbitrary levels before synchronization between submeshes and without the need of a central coordinator process for managing new vertices. Using the concept of canonical refinement, a simple proof of the independence of the resulting mesh on the mesh partitioning is given, which is useful in better understanding the behaviour of the biseetioning refinement procedure.展开更多
In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these met...In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these methods are discussed. A-stable real-time parallel formula of two-stage third-order and A(α)-stable real-time parallel formula with o ≈ 89.96° of three-stage fourth-order are particularly given. The numerical simulation experiments in parallel environment show that the class of algorithms is efficient and applicable, with greater speedup.展开更多
This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log^3n) time with n processors. The algorithms uses a divide and conquer approach and is base...This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log^3n) time with n processors. The algorithms uses a divide and conquer approach and is based on the novel idea of a one-way separator, which has the property that any directed path can be crossed only once.展开更多
We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers inclu...We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers include the following areas: Evolutionary algorithms based on imitating the evolution processes of nature for parallel problem solving, especially for parallel optimization and model-building; Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogies such as elastic relaxation process and annealing process, for scientific computations, especially for solving nonlinear mathematical physics problems. All these algorithms have the following common characteristics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solvers are from imitating the natural evolutionary processes.展开更多
In this paper, a parallel algorithm with iterative form for solving finite element equation is presented. Based on the iterative solution of linear algebra equations, the parallel computational steps are introduced in...In this paper, a parallel algorithm with iterative form for solving finite element equation is presented. Based on the iterative solution of linear algebra equations, the parallel computational steps are introduced in this method. Also by using the weighted residual method and choosing the appropriate weighting functions, the finite element basic form of parallel algorithm is deduced. The program of this algorithm has been realized on the ELXSI-6400 parallel computer of Xi'an Jiaotong University. The computational results show the operational speed will be raised and the CPU time will be cut down effectively. So this method is one kind of effective parallel algorithm for solving the finite element equations of large-scale structures.展开更多
Different methods for revising propositional knowledge base have been proposed recently by several researchers, but all methods are intractable in the general case. For practical application, this paper presents a rev...Different methods for revising propositional knowledge base have been proposed recently by several researchers, but all methods are intractable in the general case. For practical application, this paper presents a revision method in special case, and gives a corresponding polynomial algorithm as well as its parallel version on CREW PRAM.展开更多
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integratio...Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integration term. The second term can be solved by the series solution. Two hybrid granularity parallel algorithms are designed, that is, the exponential matrix and the first term are computed by the fine-grained parallel algorithra and the second term is computed by the coarse-grained parallel algorithm. Numerical examples show that these two hybrid granularity parallel algorithms obtain higher speedup and parallel efficiency than two existing parallel algorithms.展开更多
Presents a new parallel image matching algorithm based on the concept of entropy feature vector and suitable to SIMD computer, which, in comparison with other algorithms, has the following advantages:(1)The spatial in...Presents a new parallel image matching algorithm based on the concept of entropy feature vector and suitable to SIMD computer, which, in comparison with other algorithms, has the following advantages:(1)The spatial information of an image is appropriately introduced into the definition of image entropy. (2) A large number of multiplication operations are eliminated, thus the algorithm is sped up. (3) The shortcoming of having to do global calculation in the first instance is overcome, and concludes the algorithm has very good locality and is suitable for parallel processing.展开更多
The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algori...The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance.展开更多
In Surface wave waveform inversion, we want to reconstruct 3D shear wave velocity structure, which calculation beyond the capability of the powerful present day personal computer or even workstation. So we designed a ...In Surface wave waveform inversion, we want to reconstruct 3D shear wave velocity structure, which calculation beyond the capability of the powerful present day personal computer or even workstation. So we designed a high paralleled algorithm and carried out the inversion on Parallel computer based on the partitioned waveform inversion (PWI). It partitions the large scale optimization problem into a number of independent small scale problems and reduces the computational effort by several orders of magnitude. We adopted surface waveform inversion with a equal block(2 o×2 o) discretization.展开更多
Basetl on the finite element solution of the parametric varialional principle of elastic con/del problem, a corresponding parallel algorithm has been created bv utilizing the specialities of parallel computer and the ...Basetl on the finite element solution of the parametric varialional principle of elastic con/del problem, a corresponding parallel algorithm has been created bv utilizing the specialities of parallel computer and the architecture of concurrent processing in this paper. In this algorithm. the parallelisms have heen realized in the processes of creation and assembly of stiffness matrix, of the static condensation, of the solution of stresses and in many other aspects. The programme of this algorithm has been realized on ELXSI-6400 parallel computer of Xi'an Jiaotong University. The results of computation show that the computational time can be saved efficiently and it is an effective parallel algorithm for the analyses of contact problems.展开更多
Based on the general methods in power flow calculation of power system and on conceptions and classifications of parallel algorithm, a new approach named Dynamic Asynchronous Parallel Algorithm that applies to the onl...Based on the general methods in power flow calculation of power system and on conceptions and classifications of parallel algorithm, a new approach named Dynamic Asynchronous Parallel Algorithm that applies to the online analysis and real-time dispatching and controlling of large-scale power network was put forward in this paper. Its performances of high speed and dynamic following have been verified on IEEE-14 bus system.展开更多
This paper presents an efficient parallel algorithm for the shortest-path problem in interval graph for computing shortest-paths in a weighted interval graph that runs in O(n) time with n intervals in a graph. A linea...This paper presents an efficient parallel algorithm for the shortest-path problem in interval graph for computing shortest-paths in a weighted interval graph that runs in O(n) time with n intervals in a graph. A linear processor CRCW algorithm for determining the shortest-paths in an interval graphs is given.展开更多
A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2 n/4 ) 1-ε ...A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2 n/4 ) 1-ε processors, 0≤ ε ≤1, and O(2 n/2 ) memory to find a solution for the n -element knapsack problem in time O(2 n/4 (2 n/4 ) ε) . The cost of the proposed parallel algorithm is O(2 n/2 ) , which is an optimal method for solving the knapsack problem without memory conflicts and an improved result over the past researches.展开更多
The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or ve...The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or very large complicated structures, we must use the parallel algorithm with the aid of high-performance computers to solve complex problems. This paper introduces the implementation process having the parallel with sparse linear equations from the perspective of sparse linear equation group.展开更多
On the basis of Floyd algorithm with the extended path matrix, a parallel algorithm which resolves all-pair shortest path (APSP) problem on cluster environment is analyzed and designed. Meanwhile, the parallel APSP ...On the basis of Floyd algorithm with the extended path matrix, a parallel algorithm which resolves all-pair shortest path (APSP) problem on cluster environment is analyzed and designed. Meanwhile, the parallel APSP pipelining algorithm makes full use of overlapping technique between computation and communication. Compared with broadcast operation, the parallel algorithm reduces communication cost. This algorithm has been implemented on MPI on PC-cluster. The theoretical analysis and experimental results show that the parallel algorithm is an efficient and scalable algorithm.展开更多
A parallel arithmetic program for the molecular dynamics (MD) simulation study of a large sized system consisting of 50 000100 000 atoms of liquid metals is reformed, based on the cascade arithmetic program used for t...A parallel arithmetic program for the molecular dynamics (MD) simulation study of a large sized system consisting of 50 000100 000 atoms of liquid metals is reformed, based on the cascade arithmetic program used for the molecular dynamics simulation study of a small sized system consisting of 5001 000 atoms. The program is used to simulate the rapid solidification processes of liquid metal Al system. Some new results, such as larger clusters composed of more than 36 smaller clusters (icosahedra or defect icosahedra) obtained in the system of 50 000 atoms, however, the larger clusters can not be seen in the small sized system of 5001 000 atoms. On the other hand, the results from this simulation study would be more closed to the real situation of the system under consideration because the influence of boundary conditions is decreased remarkably. It can be expected that from the parallel algorithm combined with the higher performance super computer, the total number of atoms in simulation system can be enlarged again up to tens, even hundreds times in the near future.展开更多
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a paral...In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.展开更多
The parallel algorithms of iterated defect correction methods (PIDeCM’s) are constructed, which are of efficiency and high order B-convergence for general nonlinear stiff systems in ODE’S. As the basis of constructi...The parallel algorithms of iterated defect correction methods (PIDeCM’s) are constructed, which are of efficiency and high order B-convergence for general nonlinear stiff systems in ODE’S. As the basis of constructing and discussing PIDeCM’s. a class of parallel one-leg methods is also investigated, which are of particular efficiency for linear systems.展开更多
文摘This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.
基金supported by the 973 Program of China 2005CB321702China NSF 10531080.
文摘Local mesh refinement is one of the key steps in the implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement of tetrahedral meshes using bisection. This algorithm is used in PHG, Parallel Hierarchical Grid Chttp://lsec. cc. ac. cn/phg/), a toolbox under active development for parallel adaptive finite element solutions of partial differential equations. The algorithm proposed is characterized by allowing simukaneous refinement of submeshes to arbitrary levels before synchronization between submeshes and without the need of a central coordinator process for managing new vertices. Using the concept of canonical refinement, a simple proof of the independence of the resulting mesh on the mesh partitioning is given, which is useful in better understanding the behaviour of the biseetioning refinement procedure.
基金This project was supported by the National Natural Science Foundation of China (No. 19871080).
文摘In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these methods are discussed. A-stable real-time parallel formula of two-stage third-order and A(α)-stable real-time parallel formula with o ≈ 89.96° of three-stage fourth-order are particularly given. The numerical simulation experiments in parallel environment show that the class of algorithms is efficient and applicable, with greater speedup.
文摘This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log^3n) time with n processors. The algorithms uses a divide and conquer approach and is based on the novel idea of a one-way separator, which has the property that any directed path can be crossed only once.
基金Supported by the National Natural Science Foundation of China( No.6 0 1330 10 ,No.70 0 710 42 ,No.6 0 0 730 43) andNational Laboratory for Parallel and Distributed Processing
文摘We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers include the following areas: Evolutionary algorithms based on imitating the evolution processes of nature for parallel problem solving, especially for parallel optimization and model-building; Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogies such as elastic relaxation process and annealing process, for scientific computations, especially for solving nonlinear mathematical physics problems. All these algorithms have the following common characteristics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solvers are from imitating the natural evolutionary processes.
基金This work has been carried out as of a research project which has been supported by the National Structural Strength & Vibration Laboratory of Xi'an Jiaotong University with National Fund
文摘In this paper, a parallel algorithm with iterative form for solving finite element equation is presented. Based on the iterative solution of linear algebra equations, the parallel computational steps are introduced in this method. Also by using the weighted residual method and choosing the appropriate weighting functions, the finite element basic form of parallel algorithm is deduced. The program of this algorithm has been realized on the ELXSI-6400 parallel computer of Xi'an Jiaotong University. The computational results show the operational speed will be raised and the CPU time will be cut down effectively. So this method is one kind of effective parallel algorithm for solving the finite element equations of large-scale structures.
文摘Different methods for revising propositional knowledge base have been proposed recently by several researchers, but all methods are intractable in the general case. For practical application, this paper presents a revision method in special case, and gives a corresponding polynomial algorithm as well as its parallel version on CREW PRAM.
基金the National Natural Science Foundation of China(No.60273048).
文摘Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integration term. The second term can be solved by the series solution. Two hybrid granularity parallel algorithms are designed, that is, the exponential matrix and the first term are computed by the fine-grained parallel algorithra and the second term is computed by the coarse-grained parallel algorithm. Numerical examples show that these two hybrid granularity parallel algorithms obtain higher speedup and parallel efficiency than two existing parallel algorithms.
文摘Presents a new parallel image matching algorithm based on the concept of entropy feature vector and suitable to SIMD computer, which, in comparison with other algorithms, has the following advantages:(1)The spatial information of an image is appropriately introduced into the definition of image entropy. (2) A large number of multiplication operations are eliminated, thus the algorithm is sped up. (3) The shortcoming of having to do global calculation in the first instance is overcome, and concludes the algorithm has very good locality and is suitable for parallel processing.
文摘The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance.
基金Sponsored by NSFC(4973415 0 ) and National Defense Prediction F und
文摘In Surface wave waveform inversion, we want to reconstruct 3D shear wave velocity structure, which calculation beyond the capability of the powerful present day personal computer or even workstation. So we designed a high paralleled algorithm and carried out the inversion on Parallel computer based on the partitioned waveform inversion (PWI). It partitions the large scale optimization problem into a number of independent small scale problems and reduces the computational effort by several orders of magnitude. We adopted surface waveform inversion with a equal block(2 o×2 o) discretization.
基金Supported by the National Funds of National Structutal Vibration & Strength Laboratory of Xi'an Jiaotong University
文摘Basetl on the finite element solution of the parametric varialional principle of elastic con/del problem, a corresponding parallel algorithm has been created bv utilizing the specialities of parallel computer and the architecture of concurrent processing in this paper. In this algorithm. the parallelisms have heen realized in the processes of creation and assembly of stiffness matrix, of the static condensation, of the solution of stresses and in many other aspects. The programme of this algorithm has been realized on ELXSI-6400 parallel computer of Xi'an Jiaotong University. The results of computation show that the computational time can be saved efficiently and it is an effective parallel algorithm for the analyses of contact problems.
文摘Based on the general methods in power flow calculation of power system and on conceptions and classifications of parallel algorithm, a new approach named Dynamic Asynchronous Parallel Algorithm that applies to the online analysis and real-time dispatching and controlling of large-scale power network was put forward in this paper. Its performances of high speed and dynamic following have been verified on IEEE-14 bus system.
文摘This paper presents an efficient parallel algorithm for the shortest-path problem in interval graph for computing shortest-paths in a weighted interval graph that runs in O(n) time with n intervals in a graph. A linear processor CRCW algorithm for determining the shortest-paths in an interval graphs is given.
文摘A new parallel algorithm is proposed for the knapsack problem where the method of divide and conquer is adopted. Based on an EREW-SIMD machine with shared memory, the proposed algorithm utilizes O(2 n/4 ) 1-ε processors, 0≤ ε ≤1, and O(2 n/2 ) memory to find a solution for the n -element knapsack problem in time O(2 n/4 (2 n/4 ) ε) . The cost of the proposed parallel algorithm is O(2 n/2 ) , which is an optimal method for solving the knapsack problem without memory conflicts and an improved result over the past researches.
文摘The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or very large complicated structures, we must use the parallel algorithm with the aid of high-performance computers to solve complex problems. This paper introduces the implementation process having the parallel with sparse linear equations from the perspective of sparse linear equation group.
基金the National Natural Science Foundation of China under Grant No. 60671033.
文摘On the basis of Floyd algorithm with the extended path matrix, a parallel algorithm which resolves all-pair shortest path (APSP) problem on cluster environment is analyzed and designed. Meanwhile, the parallel APSP pipelining algorithm makes full use of overlapping technique between computation and communication. Compared with broadcast operation, the parallel algorithm reduces communication cost. This algorithm has been implemented on MPI on PC-cluster. The theoretical analysis and experimental results show that the parallel algorithm is an efficient and scalable algorithm.
文摘A parallel arithmetic program for the molecular dynamics (MD) simulation study of a large sized system consisting of 50 000100 000 atoms of liquid metals is reformed, based on the cascade arithmetic program used for the molecular dynamics simulation study of a small sized system consisting of 5001 000 atoms. The program is used to simulate the rapid solidification processes of liquid metal Al system. Some new results, such as larger clusters composed of more than 36 smaller clusters (icosahedra or defect icosahedra) obtained in the system of 50 000 atoms, however, the larger clusters can not be seen in the small sized system of 5001 000 atoms. On the other hand, the results from this simulation study would be more closed to the real situation of the system under consideration because the influence of boundary conditions is decreased remarkably. It can be expected that from the parallel algorithm combined with the higher performance super computer, the total number of atoms in simulation system can be enlarged again up to tens, even hundreds times in the near future.
文摘In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
文摘The parallel algorithms of iterated defect correction methods (PIDeCM’s) are constructed, which are of efficiency and high order B-convergence for general nonlinear stiff systems in ODE’S. As the basis of constructing and discussing PIDeCM’s. a class of parallel one-leg methods is also investigated, which are of particular efficiency for linear systems.