In this paper, a class of real-time parallel combined methods (RTPCM) of the digital simulation for a partitioned large system is presented. By means of combination of the parallelism across the system with the parall...In this paper, a class of real-time parallel combined methods (RTPCM) of the digital simulation for a partitioned large system is presented. By means of combination of the parallelism across the system with the parallelism across the method, stiff and non-stiff subsystems are solved in parallel on parallel computer by a parallel Rosenbrock method and a parallel RK method, respectively. Their construction, convergence and numerical stability are discussed, and the digitalsimulation experiments are conducted.展开更多
A class of modified parallel combined methods of real-time numerical simulation are presented for a stiff dynamic system. By combining the parallelism across the system with the parallelism across the method, and rela...A class of modified parallel combined methods of real-time numerical simulation are presented for a stiff dynamic system. By combining the parallelism across the system with the parallelism across the method, and relaxing the dependence of stage value computation on sampling time of input function, a class of modified real-time parallel combined methods are constructed. Stiff and nonstiff subsystems are solved in parallel on a parallel computer by a parallel Rosen-brock method and a parallel RK method, respectively. Their order conditions and convergences are discussed. The numerical simulation experiments show that this class of modified algorithms can get high speed and efficiency.展开更多
The combined finiteediscrete element method (FDEM) belongs to a family of methods of computationalmechanics of discontinua. The method is suitable for problems of discontinua, where particles aredeformable and can f...The combined finiteediscrete element method (FDEM) belongs to a family of methods of computationalmechanics of discontinua. The method is suitable for problems of discontinua, where particles aredeformable and can fracture or fragment. The applications of FDEM have spread over a number of disciplinesincluding rock mechanics, where problems like mining, mineral processing or rock blasting canbe solved by employing FDEM. In this work, a novel approach for the parallelization of two-dimensional(2D) FDEM aiming at clusters and desktop computers is developed. Dynamic domain decompositionbased parallelization solvers covering all aspects of FDEM have been developed. These have beenimplemented into the open source Y2D software package and have been tested on a PC cluster. Theoverall performance and scalability of the parallel code have been studied using numerical examples. Theresults obtained confirm the suitability of the parallel implementation for solving large scale problems. 2014 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting byElsevier B.V. All rights reserved.展开更多
Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation fr...Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method.展开更多
In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear...In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear multistep method, which overcomes the defect of the 3rd order parallel Runge-Kutta method discussed in [1].展开更多
Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain d...Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain decomposition of grids, creation of virtual diagonal bordered matrix, assembling of boundary matrix, parallel LDL^T decomposition, parallel solving of Poisson Equation, parallel estimation of convergence and so on. The parallel computing method can solve the problems that are difficult to solve using traditional serial computing. Furthermore, existing microcomputers can be fully used to resolve some large-scale problems of complex turbulent flow.展开更多
In this paper parallel Rosenbrock methods in real-time simulation are presented on parallel computers. Their construction, their convergence and their numerical stability are studied, and the numerical simulation expe...In this paper parallel Rosenbrock methods in real-time simulation are presented on parallel computers. Their construction, their convergence and their numerical stability are studied, and the numerical simulation experiments are conducted on a personal computer and a parallel computer respectively. [ABSTRACT FROM AUTHOR]展开更多
In this paper,a 4th order parallel computation method with four processes for solving ODEs is discussed.This method is the Runge-Kutta method combined with a linear multistep method,which overcomes the difficulties of...In this paper,a 4th order parallel computation method with four processes for solving ODEs is discussed.This method is the Runge-Kutta method combined with a linear multistep method,which overcomes the difficulties of the 4th order parallel Runge-Kutta method discussed in [1].The concept of critical speedup for parallel methods is also defined,and speedups of some methods are analyzed by using this concept.展开更多
随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式...随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式训练方法与策略。针对这些问题,从三个方面综述大模型分布式训练方法在过去十几年里的进展,包含分布式训练的架构并行加速策略以及内存和计算优化方面的内容,最后提出了未来可以探索的研究方向。展开更多
文摘In this paper, a class of real-time parallel combined methods (RTPCM) of the digital simulation for a partitioned large system is presented. By means of combination of the parallelism across the system with the parallelism across the method, stiff and non-stiff subsystems are solved in parallel on parallel computer by a parallel Rosenbrock method and a parallel RK method, respectively. Their construction, convergence and numerical stability are discussed, and the digitalsimulation experiments are conducted.
基金This project was supported by the National Natural Science Foundation of China (19871080).
文摘A class of modified parallel combined methods of real-time numerical simulation are presented for a stiff dynamic system. By combining the parallelism across the system with the parallelism across the method, and relaxing the dependence of stage value computation on sampling time of input function, a class of modified real-time parallel combined methods are constructed. Stiff and nonstiff subsystems are solved in parallel on a parallel computer by a parallel Rosen-brock method and a parallel RK method, respectively. Their order conditions and convergences are discussed. The numerical simulation experiments show that this class of modified algorithms can get high speed and efficiency.
文摘The combined finiteediscrete element method (FDEM) belongs to a family of methods of computationalmechanics of discontinua. The method is suitable for problems of discontinua, where particles aredeformable and can fracture or fragment. The applications of FDEM have spread over a number of disciplinesincluding rock mechanics, where problems like mining, mineral processing or rock blasting canbe solved by employing FDEM. In this work, a novel approach for the parallelization of two-dimensional(2D) FDEM aiming at clusters and desktop computers is developed. Dynamic domain decompositionbased parallelization solvers covering all aspects of FDEM have been developed. These have beenimplemented into the open source Y2D software package and have been tested on a PC cluster. Theoverall performance and scalability of the parallel code have been studied using numerical examples. Theresults obtained confirm the suitability of the parallel implementation for solving large scale problems. 2014 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting byElsevier B.V. All rights reserved.
基金This work was supported by the National Key Research and Development Program of China[Grant No.2016YFC0800200]the National Natural Science Foundation of China[Grant Nos.51778568,51908492,and 52008366]+1 种基金Zhejiang Provincial Natural Science Foundation of China[Grant Nos.LQ21E080019 and LY21E080022]This work was also sup-ported by the Key Laboratory of Space Structures of Zhejiang Province(Zhejiang University)and the Center for Balance Architecture of Zhejiang University.
文摘Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method.
文摘In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear multistep method, which overcomes the defect of the 3rd order parallel Runge-Kutta method discussed in [1].
文摘Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain decomposition of grids, creation of virtual diagonal bordered matrix, assembling of boundary matrix, parallel LDL^T decomposition, parallel solving of Poisson Equation, parallel estimation of convergence and so on. The parallel computing method can solve the problems that are difficult to solve using traditional serial computing. Furthermore, existing microcomputers can be fully used to resolve some large-scale problems of complex turbulent flow.
文摘In this paper parallel Rosenbrock methods in real-time simulation are presented on parallel computers. Their construction, their convergence and their numerical stability are studied, and the numerical simulation experiments are conducted on a personal computer and a parallel computer respectively. [ABSTRACT FROM AUTHOR]
文摘In this paper,a 4th order parallel computation method with four processes for solving ODEs is discussed.This method is the Runge-Kutta method combined with a linear multistep method,which overcomes the difficulties of the 4th order parallel Runge-Kutta method discussed in [1].The concept of critical speedup for parallel methods is also defined,and speedups of some methods are analyzed by using this concept.
文摘随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式训练方法与策略。针对这些问题,从三个方面综述大模型分布式训练方法在过去十几年里的进展,包含分布式训练的架构并行加速策略以及内存和计算优化方面的内容,最后提出了未来可以探索的研究方向。