As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle d...As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2.展开更多
In view of the satellite cloud-derived wind inversion has the characteristics of large scale,intensive computing and time-consuming serial inversion algorithm is very difficult to break through the bottleneck of effic...In view of the satellite cloud-derived wind inversion has the characteristics of large scale,intensive computing and time-consuming serial inversion algorithm is very difficult to break through the bottleneck of efficiency.We proposed a parallel acceleration scheme of cloud-derived wind inversion algorithm based on MPI cluster parallel technique in this paper.The divide-and-conquer idea,assigning winds vector inversion tasks to each computing unit,is identified according to a certain strategy.Each computing unit executes the assigned tasks in parallel,namely divide-and-rule the inversion task,so as to reduce the efficiency bottleneck of long inversion time caused by serial time accumulation.In the scheme of parallel acceleration based on MPI cluster,an algorithm based on performance prediction is proposed to effectively implement load balance of MPI clusters.Through the comparative analysis of experiment data using the parallel scheme of this parallel technology framework,it shows that this parallel technology has a certain acceleration effect on the cloud-derived wind inversion algorithm.The speedup of the MPI-based parallel algorithm reaches 14.96,which achieved the expected estimate.At the same time,this paper also proposes an efficiency optimization algorithm for cloud-derived wind inversion.In the case that the inversion of wind vector accuracy loss is minimal,the optimized algorithm execution time can be up to 13 times faster.展开更多
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N...Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.展开更多
Data encryption is essential in securing exchanged data between connected parties.Encryption is the process of transforming readable text into scrambled,unreadable text using secure keys.Stream ciphers are one type of...Data encryption is essential in securing exchanged data between connected parties.Encryption is the process of transforming readable text into scrambled,unreadable text using secure keys.Stream ciphers are one type of an encryption algorithm that relies on only one key for decryption and as well as encryption.Many existing encryption algorithms are developed based on either a mathematical foundation or on other biological,social or physical behaviours.One technique is to utilise the behavioural aspects of game theory in a stream cipher.In this paper,we introduce an enhanced Deoxyribonucleic acid(DNA)-coded stream cipher based on an iterated n-player prisoner’s dilemma paradigm.Our main goal is to contribute to adding more layers of randomness to the behaviour of the keystream generation process;these layers are inspired by the behaviour of multiple players playing a prisoner’s dilemma game.We implement parallelism to compensate for the additional processing time that may result fromadding these extra layers of randomness.The results show that our enhanced design passes the statistical tests and achieves an encryption throughput of about 1,877 Mbit/s,which makes it a feasible secure stream cipher.展开更多
Multicomputer systems(distributed memory computer systems) are becoming more and more popular and will be wildly used in scientific researches. In this paper, we present a parallel algorithm of Fourier Transform of a ...Multicomputer systems(distributed memory computer systems) are becoming more and more popular and will be wildly used in scientific researches. In this paper, we present a parallel algorithm of Fourier Transform of a vector of complex numbers on multicomputer system and give its computing times and its speedup in parallel environment supported by EXPRESS system on the multicomputer system which consists of four SGI workstations. Our analysis shows that the results is ideal and this scheme is suitable to multicomputer systems.展开更多
Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing a...Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine.展开更多
Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineerin...Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.展开更多
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par...The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.展开更多
An efficient approach is proposed for the equivalent linearization of frame structures with plastic hinges under nonstationary seismic excitations.The concentrated plastic hinges,described by the Bouc-Wen model,are as...An efficient approach is proposed for the equivalent linearization of frame structures with plastic hinges under nonstationary seismic excitations.The concentrated plastic hinges,described by the Bouc-Wen model,are assumed to occur at the two ends of a linear-elastic beam element.The auxiliary differential equations governing the plastic rotational displacements and their corresponding hysteretic displacements are replaced with linearized differential equations.Then,the two sets of equations of motion for the original nonlinear system can be reduced to an expanded-order equivalent linearized equation of motion for equivalent linear systems.To solve the equation of motion for equivalent linear systems,the nonstationary random vibration analysis is carried out based on the explicit time-domain method with high efficiency.Finally,the proposed treatment method for initial values of equivalent parameters is investigated in conjunction with parallel computing technology,which provides a new way of obtaining the equivalent linear systems at different time instants.Based on the explicit time-domain method,the key responses of interest of the converged equivalent linear system can be calculated through dimension reduction analysis with high efficiency.Numerical examples indicate that the proposed approach has high computational efficiency,and shows good applicability to weak nonlinear and medium-intensity nonlinear systems.展开更多
The geometry of joints has a significant influence on the mechanical properties of rocks.To simplify the curved joint shapes in rocks,the joint shape is usually treated as straight lines or planes in most laboratory e...The geometry of joints has a significant influence on the mechanical properties of rocks.To simplify the curved joint shapes in rocks,the joint shape is usually treated as straight lines or planes in most laboratory experiments and numerical simulations.In this study,the computerized tomography (CT) scanning and photogrammetry were employed to obtain the internal and surface joint structures of a limestone sample,respectively.To describe the joint geometry,the edge detection algorithms and a three-dimensional (3D) matrix mapping method were applied to reconstruct CT-based and photogrammetry-based jointed rock models.For comparison tests,the numerical uniaxial compression tests were conducted on an intact rock sample and a sample with a joint simplified to a plane using the parallel computing method.The results indicate that the mechanical characteristics and failure process of jointed rocks are significantly affected by the geometry of joints.The presence of joints reduces the uniaxial compressive strength (UCS),elastic modulus,and released acoustic emission (AE) energy of rocks by 37%–67%,21%–24%,and 52%–90%,respectively.Compared to the simplified joint sample,the proposed photogrammetry-based numerical model makes the most of the limited geometry information of joints.The UCS,accumulative released AE energy,and elastic modulus of the photogrammetry-based sample were found to be very close to those of the CT-based sample.The UCS value of the simplified joint sample (i.e.38.5 MPa) is much lower than that of the CT-based sample (i.e.72.3 MPa).Additionally,the accumulative released AE energy observed in the simplified joint sample is 3.899 times lower than that observed in the CT-based sample.CT scanning provides a reliable means to visualize the joints in rocks,which can be used to verify the reliability of photogrammetry techniques.The application of the photogrammetry-based sample enables detailed analysis for estimating the mechanical properties of jointed rocks.展开更多
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr...In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.展开更多
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr...This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.展开更多
Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes...Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.展开更多
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel al...With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel algorithms is a difficult issue.We introduce the idea of group search to the single-point search-based non-linear inversion algorithm, taking the quantum Monte Carlo method as an example for two-dimensional seismic wave velocity inversion and practical impedance inversion and test the calculation efficiency of using different node numbers.The results show the parallel algorithm in theoretical and practical data inversion is feasible and effective.The parallel algorithm has good versatility. The algorithm efficiency increases with increasing node numbers but the algorithm efficiency rate of increase gradually decreases as the node numbers increase.展开更多
In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation al...In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation algorithm is first developed, in which the more inherent concurrency is explored. Then its parallel implementation by using a PVM mechanism and the running performance analysis are provided. The analysis results show the expected speed-up obtained and demonstrate that the PVM has good application prospects for parallel computation in a distributed network.展开更多
Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electro...Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electromagnetic induction(EMI) effects.This is especially true under high frequencies,where the EMI effect can exceed the IP effect.2D inversion that only considers the IP effect reduces the reliability of the inversion data.In this paper,we derive differential equations using Maxwell's equations.With the introduction of the Cole-Cole model,we use the finite-element method to conduct2 D SIP forward modeling that considers the EMI and IP effects simultaneously.The data-space Occam method,in which different constraints to the model smoothness and parametric boundaries are introduced,is then used to simultaneously obtain the four parameters of the Cole-Cole model using multi-array electric field data.This approach not only improves the stability of the inversion but also significantly reduces the solution ambiguity.To improve the computational efficiency,message passing interface programming was used to accelerate the 2D SIP forward modeling and inversion.Synthetic datasets were tested using both serial and parallel algorithms,and the tests suggest that the proposed parallel algorithm is robust and efficient.展开更多
JMCT is a large-scale,high-fidelity,three-dimensional general neutron–photon–electron–proton transport Monte Carlo software system.It was developed based on the combinatorial geometry parallel infrastructure JCOGIN...JMCT is a large-scale,high-fidelity,three-dimensional general neutron–photon–electron–proton transport Monte Carlo software system.It was developed based on the combinatorial geometry parallel infrastructure JCOGIN and the adaptive structured mesh infrastructure JASMIN.JMCT is equipped with CAD modeling and visualizes the image output.It supports the geometry of the body and the structured/unstructured mesh.JMCT has most functions,variance reduction techniques,and tallies of the traditional Monte Carlo particle transport codes.Two energy models,multi-group and continuous,are provided.In recent years,some new functions and algorithms have been developed,such as Doppler broadening on-thefly(OTF),uniform tally density(UTD),consistent adjoint driven importance sampling(CADIS),fast criticality search of boron concentration(FCSBC)domain decomposition(DD),adaptive control rod moving(ACRM),and random geometry(RG)etc.The JMCT is also coupled with the discrete ordinate SNcode JSNT to generate source-biasing factors and weight-window parameters.At present,the number of geometric bodies,materials,tallies,depletion zones,and parallel processors are sufficiently large to simulate extremely complicated device problems.JMCT can be used to simulate reactor physics,criticality safety analysis,radiation shielding,detector response,nuclear well logging,and dosimetry calculations etc.In particular,JMCT can be coupled with depletion and thermal-hydraulics for the simulation of reactor nuclear-hot feedback effects.This paper describes the progress in advanced modeling,high-performance numerical simulation of particle transport,multiphysics coupled calculations,and large-scale parallel computing.展开更多
In this paper, a mathematical model consisting of forward and backward models is built on parallel genetic algorithms (PGAs) for fault diagnosis in a transmission power system. A new method to reduce the scale of faul...In this paper, a mathematical model consisting of forward and backward models is built on parallel genetic algorithms (PGAs) for fault diagnosis in a transmission power system. A new method to reduce the scale of fault sections is developed in the forward model and the message passing interface (MPI) approach is chosen to parallel the genetic algorithms by global sin-gle-population master-slave method (GPGAs). The proposed approach is applied to a sample system consisting of 28 sections, 84 protective relays and 40 circuit breakers. Simulation results show that the new model based on GPGAs can achieve very fast computation in online applications of large-scale power systems.展开更多
The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies al...The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies all pairs of intersected fractures(the basic components composing the connectivity of a network),is one of its crucial procedures.This paper attempts to improve intersection analysis through parallel computing.Considering a seamless interfacing with other procedures in modeling,two algorithms are designed and presented,of which one is a completely independent parallel procedure with some redundant computations and the other is an optimized version with reduced redundancy.A numerical study indicates that both of the algorithms are practical and can significantly improve the computational performance of intersection analysis for large-scale simulations.Moreover,the preferred application conditions for the two algorithms are also discussed.展开更多
基金the financial support provided by the National Key Research and Development Program of China(Grant No.2016YFC0800200)the National Natural Science Foundation of China(Grant Nos.51578494 and 51778568)the Fundamental Research Funds for the Central Universities(Grant No.2019QNA4043).
文摘As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2.
基金This work was supported in part by the National Natural Science Foundation of China(61872160,51679105,51809112)“Thirteenth Five Plan”Science and Technology Project of Education Department,Jilin Province(JJKH20200990KJ).
文摘In view of the satellite cloud-derived wind inversion has the characteristics of large scale,intensive computing and time-consuming serial inversion algorithm is very difficult to break through the bottleneck of efficiency.We proposed a parallel acceleration scheme of cloud-derived wind inversion algorithm based on MPI cluster parallel technique in this paper.The divide-and-conquer idea,assigning winds vector inversion tasks to each computing unit,is identified according to a certain strategy.Each computing unit executes the assigned tasks in parallel,namely divide-and-rule the inversion task,so as to reduce the efficiency bottleneck of long inversion time caused by serial time accumulation.In the scheme of parallel acceleration based on MPI cluster,an algorithm based on performance prediction is proposed to effectively implement load balance of MPI clusters.Through the comparative analysis of experiment data using the parallel scheme of this parallel technology framework,it shows that this parallel technology has a certain acceleration effect on the cloud-derived wind inversion algorithm.The speedup of the MPI-based parallel algorithm reaches 14.96,which achieved the expected estimate.At the same time,this paper also proposes an efficiency optimization algorithm for cloud-derived wind inversion.In the case that the inversion of wind vector accuracy loss is minimal,the optimized algorithm execution time can be up to 13 times faster.
基金supported by the National Natural Science Foundation of China (No.11172134)the Funding of Jiangsu Innovation Program for Graduate Education (No.CXLX13_132)
文摘Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.
文摘Data encryption is essential in securing exchanged data between connected parties.Encryption is the process of transforming readable text into scrambled,unreadable text using secure keys.Stream ciphers are one type of an encryption algorithm that relies on only one key for decryption and as well as encryption.Many existing encryption algorithms are developed based on either a mathematical foundation or on other biological,social or physical behaviours.One technique is to utilise the behavioural aspects of game theory in a stream cipher.In this paper,we introduce an enhanced Deoxyribonucleic acid(DNA)-coded stream cipher based on an iterated n-player prisoner’s dilemma paradigm.Our main goal is to contribute to adding more layers of randomness to the behaviour of the keystream generation process;these layers are inspired by the behaviour of multiple players playing a prisoner’s dilemma game.We implement parallelism to compensate for the additional processing time that may result fromadding these extra layers of randomness.The results show that our enhanced design passes the statistical tests and achieves an encryption throughput of about 1,877 Mbit/s,which makes it a feasible secure stream cipher.
文摘Multicomputer systems(distributed memory computer systems) are becoming more and more popular and will be wildly used in scientific researches. In this paper, we present a parallel algorithm of Fourier Transform of a vector of complex numbers on multicomputer system and give its computing times and its speedup in parallel environment supported by EXPRESS system on the multicomputer system which consists of four SGI workstations. Our analysis shows that the results is ideal and this scheme is suitable to multicomputer systems.
文摘Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine.
基金National Natural Science Foundation of China,Grant/Award Number:51979187。
文摘Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.
基金the Deanship of Scientific Research at King Abdulaziz University,Jeddah,Saudi Arabia under the Grant No.RG-12-611-43.
文摘The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
基金Fundamental Research Funds for the Central Universities under Grant No.2682022CX072the Research and Development Plan in Key Areas of Guangdong Province under Grant No.2020B0202010008。
文摘An efficient approach is proposed for the equivalent linearization of frame structures with plastic hinges under nonstationary seismic excitations.The concentrated plastic hinges,described by the Bouc-Wen model,are assumed to occur at the two ends of a linear-elastic beam element.The auxiliary differential equations governing the plastic rotational displacements and their corresponding hysteretic displacements are replaced with linearized differential equations.Then,the two sets of equations of motion for the original nonlinear system can be reduced to an expanded-order equivalent linearized equation of motion for equivalent linear systems.To solve the equation of motion for equivalent linear systems,the nonstationary random vibration analysis is carried out based on the explicit time-domain method with high efficiency.Finally,the proposed treatment method for initial values of equivalent parameters is investigated in conjunction with parallel computing technology,which provides a new way of obtaining the equivalent linear systems at different time instants.Based on the explicit time-domain method,the key responses of interest of the converged equivalent linear system can be calculated through dimension reduction analysis with high efficiency.Numerical examples indicate that the proposed approach has high computational efficiency,and shows good applicability to weak nonlinear and medium-intensity nonlinear systems.
基金supported by the National Natural Science Foundation of China(Grant Nos.42277150,41977219)Henan Provincial Science and Technology Research Project(Grant No.222102320271).
文摘The geometry of joints has a significant influence on the mechanical properties of rocks.To simplify the curved joint shapes in rocks,the joint shape is usually treated as straight lines or planes in most laboratory experiments and numerical simulations.In this study,the computerized tomography (CT) scanning and photogrammetry were employed to obtain the internal and surface joint structures of a limestone sample,respectively.To describe the joint geometry,the edge detection algorithms and a three-dimensional (3D) matrix mapping method were applied to reconstruct CT-based and photogrammetry-based jointed rock models.For comparison tests,the numerical uniaxial compression tests were conducted on an intact rock sample and a sample with a joint simplified to a plane using the parallel computing method.The results indicate that the mechanical characteristics and failure process of jointed rocks are significantly affected by the geometry of joints.The presence of joints reduces the uniaxial compressive strength (UCS),elastic modulus,and released acoustic emission (AE) energy of rocks by 37%–67%,21%–24%,and 52%–90%,respectively.Compared to the simplified joint sample,the proposed photogrammetry-based numerical model makes the most of the limited geometry information of joints.The UCS,accumulative released AE energy,and elastic modulus of the photogrammetry-based sample were found to be very close to those of the CT-based sample.The UCS value of the simplified joint sample (i.e.38.5 MPa) is much lower than that of the CT-based sample (i.e.72.3 MPa).Additionally,the accumulative released AE energy observed in the simplified joint sample is 3.899 times lower than that observed in the CT-based sample.CT scanning provides a reliable means to visualize the joints in rocks,which can be used to verify the reliability of photogrammetry techniques.The application of the photogrammetry-based sample enables detailed analysis for estimating the mechanical properties of jointed rocks.
基金supported by the fund from ShenyangMint Company Limited(No.20220056)Senior Talent Foundation of Jiangsu University(No.19JDG022)Taizhou City Double Innovation and Entrepreneurship Talent Program(No.Taizhou Human Resources Office[2022]No.22).
文摘In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns.
基金the National Key R&D Program of China(2020YFB1708300)the National Natural Science Foundation of China(52005192)the Project of Ministry of Industry and Information Technology(TC210804R-3).
文摘This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.
文摘Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
基金supported by National Key S&T Special Projects of Marine Carbonate(No.2008ZX05000-004)CNPC Projects(No.2008E-0610-10)
文摘With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel algorithms is a difficult issue.We introduce the idea of group search to the single-point search-based non-linear inversion algorithm, taking the quantum Monte Carlo method as an example for two-dimensional seismic wave velocity inversion and practical impedance inversion and test the calculation efficiency of using different node numbers.The results show the parallel algorithm in theoretical and practical data inversion is feasible and effective.The parallel algorithm has good versatility. The algorithm efficiency increases with increasing node numbers but the algorithm efficiency rate of increase gradually decreases as the node numbers increase.
文摘In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation algorithm is first developed, in which the more inherent concurrency is explored. Then its parallel implementation by using a PVM mechanism and the running performance analysis are provided. The analysis results show the expected speed-up obtained and demonstrate that the PVM has good application prospects for parallel computation in a distributed network.
基金jointly sponsored by the National Natural Science Foundation of China(Grant No.41374078)the Geological Survey Projects of the Ministry of Land and Resources of China(Grant Nos.12120113086100 and 12120113101300)Beijing Higher Education Young Elite Teacher Project
文摘Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electromagnetic induction(EMI) effects.This is especially true under high frequencies,where the EMI effect can exceed the IP effect.2D inversion that only considers the IP effect reduces the reliability of the inversion data.In this paper,we derive differential equations using Maxwell's equations.With the introduction of the Cole-Cole model,we use the finite-element method to conduct2 D SIP forward modeling that considers the EMI and IP effects simultaneously.The data-space Occam method,in which different constraints to the model smoothness and parametric boundaries are introduced,is then used to simultaneously obtain the four parameters of the Cole-Cole model using multi-array electric field data.This approach not only improves the stability of the inversion but also significantly reduces the solution ambiguity.To improve the computational efficiency,message passing interface programming was used to accelerate the 2D SIP forward modeling and inversion.Synthetic datasets were tested using both serial and parallel algorithms,and the tests suggest that the proposed parallel algorithm is robust and efficient.
基金supported by the National Natural Science Foundation of China (Nos. 11805017 and 12001050)
文摘JMCT is a large-scale,high-fidelity,three-dimensional general neutron–photon–electron–proton transport Monte Carlo software system.It was developed based on the combinatorial geometry parallel infrastructure JCOGIN and the adaptive structured mesh infrastructure JASMIN.JMCT is equipped with CAD modeling and visualizes the image output.It supports the geometry of the body and the structured/unstructured mesh.JMCT has most functions,variance reduction techniques,and tallies of the traditional Monte Carlo particle transport codes.Two energy models,multi-group and continuous,are provided.In recent years,some new functions and algorithms have been developed,such as Doppler broadening on-thefly(OTF),uniform tally density(UTD),consistent adjoint driven importance sampling(CADIS),fast criticality search of boron concentration(FCSBC)domain decomposition(DD),adaptive control rod moving(ACRM),and random geometry(RG)etc.The JMCT is also coupled with the discrete ordinate SNcode JSNT to generate source-biasing factors and weight-window parameters.At present,the number of geometric bodies,materials,tallies,depletion zones,and parallel processors are sufficiently large to simulate extremely complicated device problems.JMCT can be used to simulate reactor physics,criticality safety analysis,radiation shielding,detector response,nuclear well logging,and dosimetry calculations etc.In particular,JMCT can be coupled with depletion and thermal-hydraulics for the simulation of reactor nuclear-hot feedback effects.This paper describes the progress in advanced modeling,high-performance numerical simulation of particle transport,multiphysics coupled calculations,and large-scale parallel computing.
基金the National Natural Science Foundation of China (No. 50677062)the New Century Excellent Talents in Uni-versity of China (No. NCET-07-0745)the Natural Science Foundation of Zhejiang Province, China (No. R107062)
文摘In this paper, a mathematical model consisting of forward and backward models is built on parallel genetic algorithms (PGAs) for fault diagnosis in a transmission power system. A new method to reduce the scale of fault sections is developed in the forward model and the message passing interface (MPI) approach is chosen to parallel the genetic algorithms by global sin-gle-population master-slave method (GPGAs). The proposed approach is applied to a sample system consisting of 28 sections, 84 protective relays and 40 circuit breakers. Simulation results show that the new model based on GPGAs can achieve very fast computation in online applications of large-scale power systems.
基金supported by the National Basic Research Program of China(973 Program)(2010CB428801,2010CB428804)National High-tech R&D Program of China(863 Program)(2011AA050105)+1 种基金National Science Foundation of China(40972166)National Science and Technology Major Project of China(2011ZX 05060-005).
文摘The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies all pairs of intersected fractures(the basic components composing the connectivity of a network),is one of its crucial procedures.This paper attempts to improve intersection analysis through parallel computing.Considering a seamless interfacing with other procedures in modeling,two algorithms are designed and presented,of which one is a completely independent parallel procedure with some redundant computations and the other is an optimized version with reduced redundancy.A numerical study indicates that both of the algorithms are practical and can significantly improve the computational performance of intersection analysis for large-scale simulations.Moreover,the preferred application conditions for the two algorithms are also discussed.