This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison ...This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.展开更多
A distribution network plays an extremely important role in the safe and efficient operation of a power grid.As the core part of a power grid’s operation,a distribution network will have a significant impact on the s...A distribution network plays an extremely important role in the safe and efficient operation of a power grid.As the core part of a power grid’s operation,a distribution network will have a significant impact on the safety and reliability of residential electricity consumption.it is necessary to actively plan and modify the distribution network’s structure in the power grid,improve the quality of the distribution network,and optimize the planning of the distribution network,so that the network can be fully utilized to meet the needs of electricity consumption.In this paper,a distribution network grid planning algorithm based on the reliability of electricity consumption was completed using ant colony algorithm.For the distribution network structure planning of dual power sources,the parallel ant colony algorithm was used to prove that the premise of parallelism is the interactive process of ant colonies,and the dual power distribution network structure model is established based on the principle of the lowest cost.The artificial ants in the algorithm were compared with real ants in nature,and the basic steps and working principle of the ant colony optimization algorithm was studied with the help of the travelling salesman problem(TSP).Then,the limitations of the ant colony algorithm were analyzed,and an improvement strategy was proposed by using python for digital simulation.The results demonstrated the reliability of model-building and algorithm improvement.展开更多
A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is stil...A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data.展开更多
A recommender system(RS)relying on latent factor analysis usually adopts stochastic gradient descent(SGD)as its learning algorithm.However,owing to its serial mechanism,an SGD algorithm suffers from low efficiency and...A recommender system(RS)relying on latent factor analysis usually adopts stochastic gradient descent(SGD)as its learning algorithm.However,owing to its serial mechanism,an SGD algorithm suffers from low efficiency and scalability when handling large-scale industrial problems.Aiming at addressing this issue,this study proposes a momentum-incorporated parallel stochastic gradient descent(MPSGD)algorithm,whose main idea is two-fold:a)implementing parallelization via a novel datasplitting strategy,and b)accelerating convergence rate by integrating momentum effects into its training process.With it,an MPSGD-based latent factor(MLF)model is achieved,which is capable of performing efficient and high-quality recommendations.Experimental results on four high-dimensional and sparse matrices generated by industrial RS indicate that owing to an MPSGD algorithm,an MLF model outperforms the existing state-of-the-art ones in both computational efficiency and scalability.展开更多
In this paper, it is supposed that the B&B algorithm finds the first optimal solution after h nodes have been expanded and m active nodes have been created in the state-space tree. Then the lower bound Ω(m+h log ...In this paper, it is supposed that the B&B algorithm finds the first optimal solution after h nodes have been expanded and m active nodes have been created in the state-space tree. Then the lower bound Ω(m+h log h) of the running time for the general sequential B&B algorithm and the lower bound Ω(m/p+h log p) for the general parallel best-first B&B algorithm in PRAM-CREW are proposed, where p is the number of processors available. Moreover, the lower bound Ω(M/p+H+(H/p) log (H/p)) is presented for the parallel algorithms on distributed memory system, where M and H represent total number of the active nodes and that of the expanded nodes processed by p processors, respectively. In addition, a nearly fastest general parallel best-first B&B algorithm is put forward. The parallel algorithm is the fastest one as p = max{hε, r}, where ε = 1/ rootlogh, and r is the largest branch number of the nodes in the state-space tree.展开更多
In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these met...In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these methods are discussed. A-stable real-time parallel formula of two-stage third-order and A(α)-stable real-time parallel formula with o ≈ 89.96° of three-stage fourth-order are particularly given. The numerical simulation experiments in parallel environment show that the class of algorithms is efficient and applicable, with greater speedup.展开更多
In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, ...In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.展开更多
Based on local algorithms,some parallel finite element(FE)iterative methods for stationary incompressible magnetohydrodynamics(MHD)are presented.These approaches are on account of two-grid skill include two major phas...Based on local algorithms,some parallel finite element(FE)iterative methods for stationary incompressible magnetohydrodynamics(MHD)are presented.These approaches are on account of two-grid skill include two major phases:find the FE solution by solving the nonlinear system on a globally coarse mesh to seize the low frequency component of the solution,and then locally solve linearized residual subproblems by one of three iterations(Stokes-type,Newton,and Oseen-type)on subdomains with fine grid in parallel to approximate the high frequency component.Optimal error estimates with regard to two mesh sizes and iterative steps of the proposed algorithms are given.Some numerical examples are implemented to verify the algorithm.展开更多
In this paper, a parallel simulation algorithm for the control problem in differential algebraic system is presented. The error of the algorithm is estimated. The stability analysis is made for a model problem and the...In this paper, a parallel simulation algorithm for the control problem in differential algebraic system is presented. The error of the algorithm is estimated. The stability analysis is made for a model problem and the stability region is given. The numerical example demonstrates that the method is efficient.展开更多
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a paral...In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.展开更多
Considering premature convergence in the searching process of genetic algorithm, a chaotic migration-based pseudo parallel genetic algorithm (CMPPGA) is proposed, which applies the idea of isolated evolution and infor...Considering premature convergence in the searching process of genetic algorithm, a chaotic migration-based pseudo parallel genetic algorithm (CMPPGA) is proposed, which applies the idea of isolated evolution and information exchanging in distributed Parallel Genetic Algorithm by serial program structure to solve optimization problem of low real-time demand. In this algorithm, asynchronic migration of individuals during parallel evolution is guided by a chaotic migration sequence. Information exchanging among sub-populations is ensured to be efficient and sufficient due to that the sequence is ergodic and stochastic. Simulation study of CMPPGA shows its strong global search ability, superiority to standard genetic algorithm and high immunity against premature convergence. According to the practice of raw material supply, an inventory programming model is set up and solved by CMPPGA with satisfactory results returned.展开更多
Based on domain decomposition, a parallel two-level finite element method for the stationary Navier-Stokes equations is proposed and analyzed. The basic idea of the method is first to solve the Navier-Stokes equations...Based on domain decomposition, a parallel two-level finite element method for the stationary Navier-Stokes equations is proposed and analyzed. The basic idea of the method is first to solve the Navier-Stokes equations on a coarse grid, then to solve the resulted residual equations in parallel on a fine grid. This method has low communication complexity. It can be implemented easily. By local a priori error estimate for finite element discretizations, error bounds of the approximate solution are derived. Numerical results are also given to illustrate the high efficiency of the method.展开更多
In this article,two kinds of expandable parallel finite element methods,based on two-grid discretizations,are given to solve the linear elliptic problems.Compared with the classical local and parallel finite element m...In this article,two kinds of expandable parallel finite element methods,based on two-grid discretizations,are given to solve the linear elliptic problems.Compared with the classical local and parallel finite element methods,there are two attractive features of the methods shown in this article:1)a partition of unity is used to generate a series of local and independent subproblems to guarantee the final approximation globally continuous;2)the computational domain of each local subproblem is contained in a ball with radius of O(H)(H is the coarse mesh parameter),which means methods in this article are more suitable for parallel computing in a large parallel computer system.Some a priori error estimation are obtained and optimal error bounds in both H^1-normal and L^2-normal are derived.Finally,numerical results are reported to test and verify the feasibility and validity of our methods.展开更多
This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinst...This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinstructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stagesof our proposed genetic algorithm are: The choice of the initial populationand its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, geneticalgorithms are applied to the entire search space of the parallelization ofthe program instructions problem. This problem is NP-complete, so thereare no polynomial algorithms that can scan the solution space and solve theproblem. The genetic algorithm-based method is general and it is simple andefficient to implement because it can be scaled to a larger or smaller number ofinstructions that must be parallelized. The parallelization technique proposedin this paper was developed in the C# programming language, and our resultsconfirm the effectiveness of our parallelization method. Experimental resultsobtained and presented for different working scenarios confirm the theoreticalresults, and they provide insight on how to improve the exploration of a searchspace that is too large to be searched exhaustively.展开更多
We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers inclu...We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers include the following areas: Evolutionary algorithms based on imitating the evolution processes of nature for parallel problem solving, especially for parallel optimization and model-building; Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogies such as elastic relaxation process and annealing process, for scientific computations, especially for solving nonlinear mathematical physics problems. All these algorithms have the following common characteristics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solvers are from imitating the natural evolutionary processes.展开更多
Based on the full domain partition, a parallel finite element algorithm for the stationary Stokes equations is proposed and analyzed. In this algorithm, each subproblem is defined in the entire domain. Majority of the...Based on the full domain partition, a parallel finite element algorithm for the stationary Stokes equations is proposed and analyzed. In this algorithm, each subproblem is defined in the entire domain. Majority of the degrees of freedom are associated with the relevant subdomain. Therefore, it can be solved in parallel with other subproblems using an existing sequential solver without extensive recoding. This allows the algorithm to be implemented easily with low communication costs. Numerical results are given showing the high efficiency of the parallel algorithm.展开更多
In this paper, a parallel algorithm with iterative form for solving finite element equation is presented. Based on the iterative solution of linear algebra equations, the parallel computational steps are introduced in...In this paper, a parallel algorithm with iterative form for solving finite element equation is presented. Based on the iterative solution of linear algebra equations, the parallel computational steps are introduced in this method. Also by using the weighted residual method and choosing the appropriate weighting functions, the finite element basic form of parallel algorithm is deduced. The program of this algorithm has been realized on the ELXSI-6400 parallel computer of Xi'an Jiaotong University. The computational results show the operational speed will be raised and the CPU time will be cut down effectively. So this method is one kind of effective parallel algorithm for solving the finite element equations of large-scale structures.展开更多
For a large-scale adaptive array, the heavy computational load and the high-rate data transmission are two challenges in the implementation of an adaptive digital beamforming system. An efficient parallel digital beam...For a large-scale adaptive array, the heavy computational load and the high-rate data transmission are two challenges in the implementation of an adaptive digital beamforming system. An efficient parallel digital beamforming (DBF) algorithm based on the least mean square algorithm (PLMS) is proposed. An appropriate method is found to partition the least mean square (LMS) algorithm into a number of operational modules, which can be easily executed in a distributed-parallel-processing fashion. As a result, the proposed PLMS algorithm provides an effective solution that can alleviate the bottleneck of high-rate data transmission and reduce the computational cost. PLMS requires less computational load than that of the conventional parallel algorithms based on the recursive least square (RLS) algorithm, as well as it is easier to be implemented to do real time adaptive array processing. Moreover, low sidelobe of the beam pattern is obtained by constraining the static steering vector with Tschebyscheff coefficients. Finally, a scheme of the PLMS algorithm using distributed-parallel-processing system is also proposed. The simulation results demonstrate that the PLMS algorithm has the same interference cancellation performance as that of the conventional LMS algorithm. Moreover, the PLMS algorithm can obtain the same good beamforming performance, regardless how the algorithm is partitioned. It is expected that the proposed algorithm will be used in a large-scale adaptive array system to deal with real time adaptive digital beamforming processing.展开更多
Different methods for revising propositional knowledge base have been proposed recently by several researchers, but all methods are intractable in the general case. For practical application, this paper presents a rev...Different methods for revising propositional knowledge base have been proposed recently by several researchers, but all methods are intractable in the general case. For practical application, this paper presents a revision method in special case, and gives a corresponding polynomial algorithm as well as its parallel version on CREW PRAM.展开更多
In this paper, the stability analysis for parallel real-time digital simulation models is discussed. The coupling coefficient perturbation method and the simulation stepsize perturbation method are established. For tw...In this paper, the stability analysis for parallel real-time digital simulation models is discussed. The coupling coefficient perturbation method and the simulation stepsize perturbation method are established. For two classes of systems of test equations, we construct the parallel simulation models and prove that they have the stability behaviour which is similar to the original continuous systems.展开更多
文摘This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.
文摘A distribution network plays an extremely important role in the safe and efficient operation of a power grid.As the core part of a power grid’s operation,a distribution network will have a significant impact on the safety and reliability of residential electricity consumption.it is necessary to actively plan and modify the distribution network’s structure in the power grid,improve the quality of the distribution network,and optimize the planning of the distribution network,so that the network can be fully utilized to meet the needs of electricity consumption.In this paper,a distribution network grid planning algorithm based on the reliability of electricity consumption was completed using ant colony algorithm.For the distribution network structure planning of dual power sources,the parallel ant colony algorithm was used to prove that the premise of parallelism is the interactive process of ant colonies,and the dual power distribution network structure model is established based on the principle of the lowest cost.The artificial ants in the algorithm were compared with real ants in nature,and the basic steps and working principle of the ant colony optimization algorithm was studied with the help of the travelling salesman problem(TSP).Then,the limitations of the ant colony algorithm were analyzed,and an improvement strategy was proposed by using python for digital simulation.The results demonstrated the reliability of model-building and algorithm improvement.
文摘A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data.
基金supported in part by the National Natural Science Foundation of China(61772493)the Deanship of Scientific Research(DSR)at King Abdulaziz University(RG-48-135-40)+1 种基金Guangdong Province Universities and College Pearl River Scholar Funded Scheme(2019)the Natural Science Foundation of Chongqing(cstc2019jcyjjqX0013)。
文摘A recommender system(RS)relying on latent factor analysis usually adopts stochastic gradient descent(SGD)as its learning algorithm.However,owing to its serial mechanism,an SGD algorithm suffers from low efficiency and scalability when handling large-scale industrial problems.Aiming at addressing this issue,this study proposes a momentum-incorporated parallel stochastic gradient descent(MPSGD)algorithm,whose main idea is two-fold:a)implementing parallelization via a novel datasplitting strategy,and b)accelerating convergence rate by integrating momentum effects into its training process.With it,an MPSGD-based latent factor(MLF)model is achieved,which is capable of performing efficient and high-quality recommendations.Experimental results on four high-dimensional and sparse matrices generated by industrial RS indicate that owing to an MPSGD algorithm,an MLF model outperforms the existing state-of-the-art ones in both computational efficiency and scalability.
基金This paper was supported by Ph. D. Foundation of State Education Commission of China.
文摘In this paper, it is supposed that the B&B algorithm finds the first optimal solution after h nodes have been expanded and m active nodes have been created in the state-space tree. Then the lower bound Ω(m+h log h) of the running time for the general sequential B&B algorithm and the lower bound Ω(m/p+h log p) for the general parallel best-first B&B algorithm in PRAM-CREW are proposed, where p is the number of processors available. Moreover, the lower bound Ω(M/p+H+(H/p) log (H/p)) is presented for the parallel algorithms on distributed memory system, where M and H represent total number of the active nodes and that of the expanded nodes processed by p processors, respectively. In addition, a nearly fastest general parallel best-first B&B algorithm is put forward. The parallel algorithm is the fastest one as p = max{hε, r}, where ε = 1/ rootlogh, and r is the largest branch number of the nodes in the state-space tree.
基金This project was supported by the National Natural Science Foundation of China (No. 19871080).
文摘In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these methods are discussed. A-stable real-time parallel formula of two-stage third-order and A(α)-stable real-time parallel formula with o ≈ 89.96° of three-stage fourth-order are particularly given. The numerical simulation experiments in parallel environment show that the class of algorithms is efficient and applicable, with greater speedup.
基金Supported by the National Natural Science Foundation of China(No.61071173)
文摘In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.
基金Project supported by the National Natural Science Foundation of China(Nos.11971410 and12071404)the Natural Science Foundation of Hunan Province of China(No.2019JJ40279)+2 种基金the Excellent Youth Program of Scientific Research Project of Hunan Provincial Department of Education(Nos.18B064 and 20B564)the China Postdoctoral Science Foundation(Nos.2018T110073 and 2018M631402)the International Scientific and Technological Innovation Cooperation Base of Hunan Province for Computational Science(No.2018WK4006)。
文摘Based on local algorithms,some parallel finite element(FE)iterative methods for stationary incompressible magnetohydrodynamics(MHD)are presented.These approaches are on account of two-grid skill include two major phases:find the FE solution by solving the nonlinear system on a globally coarse mesh to seize the low frequency component of the solution,and then locally solve linearized residual subproblems by one of three iterations(Stokes-type,Newton,and Oseen-type)on subdomains with fine grid in parallel to approximate the high frequency component.Optimal error estimates with regard to two mesh sizes and iterative steps of the proposed algorithms are given.Some numerical examples are implemented to verify the algorithm.
文摘In this paper, a parallel simulation algorithm for the control problem in differential algebraic system is presented. The error of the algorithm is estimated. The stability analysis is made for a model problem and the stability region is given. The numerical example demonstrates that the method is efficient.
文摘In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
文摘Considering premature convergence in the searching process of genetic algorithm, a chaotic migration-based pseudo parallel genetic algorithm (CMPPGA) is proposed, which applies the idea of isolated evolution and information exchanging in distributed Parallel Genetic Algorithm by serial program structure to solve optimization problem of low real-time demand. In this algorithm, asynchronic migration of individuals during parallel evolution is guided by a chaotic migration sequence. Information exchanging among sub-populations is ensured to be efficient and sufficient due to that the sequence is ergodic and stochastic. Simulation study of CMPPGA shows its strong global search ability, superiority to standard genetic algorithm and high immunity against premature convergence. According to the practice of raw material supply, an inventory programming model is set up and solved by CMPPGA with satisfactory results returned.
基金Project supported by the National Natural Science Foundation of China(No.11001061)the Science and Technology Foundation of Guizhou Province of China(No.[2008]2123)
文摘Based on domain decomposition, a parallel two-level finite element method for the stationary Navier-Stokes equations is proposed and analyzed. The basic idea of the method is first to solve the Navier-Stokes equations on a coarse grid, then to solve the resulted residual equations in parallel on a fine grid. This method has low communication complexity. It can be implemented easily. By local a priori error estimate for finite element discretizations, error bounds of the approximate solution are derived. Numerical results are also given to illustrate the high efficiency of the method.
基金Subsidized by NSFC (11701343)partially supported by NSFC (11571274,11401466)
文摘In this article,two kinds of expandable parallel finite element methods,based on two-grid discretizations,are given to solve the linear elliptic problems.Compared with the classical local and parallel finite element methods,there are two attractive features of the methods shown in this article:1)a partition of unity is used to generate a series of local and independent subproblems to guarantee the final approximation globally continuous;2)the computational domain of each local subproblem is contained in a ball with radius of O(H)(H is the coarse mesh parameter),which means methods in this article are more suitable for parallel computing in a large parallel computer system.Some a priori error estimation are obtained and optimal error bounds in both H^1-normal and L^2-normal are derived.Finally,numerical results are reported to test and verify the feasibility and validity of our methods.
文摘This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinstructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stagesof our proposed genetic algorithm are: The choice of the initial populationand its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, geneticalgorithms are applied to the entire search space of the parallelization ofthe program instructions problem. This problem is NP-complete, so thereare no polynomial algorithms that can scan the solution space and solve theproblem. The genetic algorithm-based method is general and it is simple andefficient to implement because it can be scaled to a larger or smaller number ofinstructions that must be parallelized. The parallelization technique proposedin this paper was developed in the C# programming language, and our resultsconfirm the effectiveness of our parallelization method. Experimental resultsobtained and presented for different working scenarios confirm the theoreticalresults, and they provide insight on how to improve the exploration of a searchspace that is too large to be searched exhaustively.
基金Supported by the National Natural Science Foundation of China( No.6 0 1330 10 ,No.70 0 710 42 ,No.6 0 0 730 43) andNational Laboratory for Parallel and Distributed Processing
文摘We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers include the following areas: Evolutionary algorithms based on imitating the evolution processes of nature for parallel problem solving, especially for parallel optimization and model-building; Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogies such as elastic relaxation process and annealing process, for scientific computations, especially for solving nonlinear mathematical physics problems. All these algorithms have the following common characteristics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solvers are from imitating the natural evolutionary processes.
基金Project supported by the National Natural Science Foundation of China (No.10971166)the National Basic Research Program (No.2005CB321703)the Science and Technology Foundation of Guizhou Province of China (No.[2008]2123)
文摘Based on the full domain partition, a parallel finite element algorithm for the stationary Stokes equations is proposed and analyzed. In this algorithm, each subproblem is defined in the entire domain. Majority of the degrees of freedom are associated with the relevant subdomain. Therefore, it can be solved in parallel with other subproblems using an existing sequential solver without extensive recoding. This allows the algorithm to be implemented easily with low communication costs. Numerical results are given showing the high efficiency of the parallel algorithm.
基金This work has been carried out as of a research project which has been supported by the National Structural Strength & Vibration Laboratory of Xi'an Jiaotong University with National Fund
文摘In this paper, a parallel algorithm with iterative form for solving finite element equation is presented. Based on the iterative solution of linear algebra equations, the parallel computational steps are introduced in this method. Also by using the weighted residual method and choosing the appropriate weighting functions, the finite element basic form of parallel algorithm is deduced. The program of this algorithm has been realized on the ELXSI-6400 parallel computer of Xi'an Jiaotong University. The computational results show the operational speed will be raised and the CPU time will be cut down effectively. So this method is one kind of effective parallel algorithm for solving the finite element equations of large-scale structures.
文摘For a large-scale adaptive array, the heavy computational load and the high-rate data transmission are two challenges in the implementation of an adaptive digital beamforming system. An efficient parallel digital beamforming (DBF) algorithm based on the least mean square algorithm (PLMS) is proposed. An appropriate method is found to partition the least mean square (LMS) algorithm into a number of operational modules, which can be easily executed in a distributed-parallel-processing fashion. As a result, the proposed PLMS algorithm provides an effective solution that can alleviate the bottleneck of high-rate data transmission and reduce the computational cost. PLMS requires less computational load than that of the conventional parallel algorithms based on the recursive least square (RLS) algorithm, as well as it is easier to be implemented to do real time adaptive array processing. Moreover, low sidelobe of the beam pattern is obtained by constraining the static steering vector with Tschebyscheff coefficients. Finally, a scheme of the PLMS algorithm using distributed-parallel-processing system is also proposed. The simulation results demonstrate that the PLMS algorithm has the same interference cancellation performance as that of the conventional LMS algorithm. Moreover, the PLMS algorithm can obtain the same good beamforming performance, regardless how the algorithm is partitioned. It is expected that the proposed algorithm will be used in a large-scale adaptive array system to deal with real time adaptive digital beamforming processing.
文摘Different methods for revising propositional knowledge base have been proposed recently by several researchers, but all methods are intractable in the general case. For practical application, this paper presents a revision method in special case, and gives a corresponding polynomial algorithm as well as its parallel version on CREW PRAM.
基金This work is supported partly by the National Natural Science Foundation of China
文摘In this paper, the stability analysis for parallel real-time digital simulation models is discussed. The coupling coefficient perturbation method and the simulation stepsize perturbation method are established. For two classes of systems of test equations, we construct the parallel simulation models and prove that they have the stability behaviour which is similar to the original continuous systems.