Objective To reduce the execution time of neural network training. Methods Parallel particle swarm optimization algorithm based on master-slave model is proposed to train radial basis function neural networks, which i...Objective To reduce the execution time of neural network training. Methods Parallel particle swarm optimization algorithm based on master-slave model is proposed to train radial basis function neural networks, which is implemented on a cluster using MPI libraries for inter-process communication. Results High speed-up factor is achieved and execution time is reduced greatly. On the other hand, the resulting neural network has good classification accuracy not only on training sets but also on test sets. Conclusion Since the fitness evaluation is intensive, parallel particle swarm optimization shows great advantages to speed up neural network training.展开更多
The heat transfer between two corresponding plates,disks,and concentric pipes has many applications,including water cleansing and lubrication.Furthermore,TiO_(2)-water-based nanofluids are used widely because it is us...The heat transfer between two corresponding plates,disks,and concentric pipes has many applications,including water cleansing and lubrication.Furthermore,TiO_(2)-water-based nanofluids are used widely because it is useful for operating and controlling the temperature,especially in photovoltaic technology and solar panels.Motivated by these applications,the current study is based on the nanoparticle aggregation effect on magnetohydrodynamics(MHD)flow via rotating parallel plates with the chemical reaction.To achieve maximum heat transportation,the Bruggeman model is used to adapt the Maxwell model.Also,melting and thermal radiation effects are considered in the modeling to discuss heat transport.The Runge-Kutta-Fehlberg 4th−5th order method is used to attain numerical solutions.The main focus of this study is to see the thermodynamic behavior considering several aspects of nanoparticle aggregation.The heat transfer rate between the parallel plates is enhanced by improving the thermophoresis,radiation,and Brownian motion parameters.The rise in Schmidt number and chemical reaction rate parameter decreases the concentration distribution.This study will be helpful in enhancing the thermal efficiency of photovoltaic technology in solar plates,water purifying,thermal management of electronic devices,designing effective cooling systems,and other sustainable technologies.展开更多
This paper studies the libration and stabilization of a parallel partial space elevator system in circular orbits. The system is made up of two paralleled partial space elevators, each of which consists of one main sa...This paper studies the libration and stabilization of a parallel partial space elevator system in circular orbits. The system is made up of two paralleled partial space elevators, each of which consists of one main satellite, one end body and a climber moving along the tether between them.The libration characteristics of the elevator are studied through numerical analysis by a new dynamic model, and a novel control strategy is proposed to stabilize the swing of the end body by projecting the climber speeds only. Optimal control method is used to implement the new control strategy in the case where the climbers move in opposite direction. The simulation results validate the effectiveness of the proposed control strategy whose application will neither sacrifice the transport efficiency nor exacerbate libration significantly.展开更多
A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on t...A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.展开更多
As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle d...As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2.展开更多
We enhance a robust parallel finite element model for coasts and estuaries cases with the use of N-Best refinement algorithms,in multilevel partitioning scheme.Graph partitioning is an important step to construct the ...We enhance a robust parallel finite element model for coasts and estuaries cases with the use of N-Best refinement algorithms,in multilevel partitioning scheme.Graph partitioning is an important step to construct the parallel model,in which computation speed is a big concern.The partitioning strategy includes the division of the research domain into several semi-equal-sized sub-domains,minimizing the sum weight of edges between different sub-domains.Multilevel schemes for graph partitioning are divided into three phases:coarsening,partitioning,and uncoarsening.In the uncoarsening phase,many refinement algorithms have been proposed previously,such as KL,Greedy,and Boundary refinements.In this study,we propose an N-Best refinement algorithm and show its advantages in our case study of Xiamen Bay.Compared with original partitioning algorithm in previous models,the N-Best algorithm can speed up the computation by 1.9 times,and the simulation results are in a good match with the in-situ data.展开更多
A new formulation for tracking multiple particles in slow viscous flow for microfluidic applications is presented.The method employs the manipulation of the boundary element matrices so that finally a system of equati...A new formulation for tracking multiple particles in slow viscous flow for microfluidic applications is presented.The method employs the manipulation of the boundary element matrices so that finally a system of equations is obtained relating the rigid body velocities of the particle to the forces applied on the particle.The formulation is specially designed for particle trajectory tracking and involves successive matrix multiplications for which SMP(Symmetric multiprocessing)parallelisation is applied.It is observed that present formulation offers an efficient numerical model to be used for particle tracking and can easily be extended for multiphysics simulations in which several physics involved.展开更多
The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, wher...The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.展开更多
Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation fr...Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method.展开更多
A new recursive algorithm with the partial parallel structure based on the linearly constrained minimum variance (LCMV) criterion for adaptive monopulse systems is proposed. The weight vector associated with the ori...A new recursive algorithm with the partial parallel structure based on the linearly constrained minimum variance (LCMV) criterion for adaptive monopulse systems is proposed. The weight vector associated with the original whole antenna array is decomposed into several adaptive weight sub-vectors firstly. An adaptive algorithm based on the conventional LCMV principle is then deduced to update the weight sub-vectors for sum and difference beam, respectively. The optimal weight vector can be obtained after convergence. The required computational complexity is evaluated for the proposed technique, which is on the order of O(N) and less than that of the conventional LCMV method. The flow chart scheme with the partial parallel structure of the proposed algorithm is introduced. This scheme is easy to be implemented on a distributed computer/digital signal processor (DSP) system to solve the problems of the heavy computational burden and vast data transmission of the large-scale adaptive monopulse array. Then, the monopulse ratio and convergence rate of the proposed algorithm are evaluated by numerical simulations. Compared with some recent adaptive monopulse estimation methods, a better performance on computational complexity and monopulse ratio can be achieved with the proposed adaptive method.展开更多
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp...The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.展开更多
Neutron-skin thickness is a key parameter for a neutron-rich nucleus;however,it is difficult to determine.In the framework of the Lanzhou Quantum Molecular Dynamics(LQMD)model,a possible probe for the neutron-skin thi...Neutron-skin thickness is a key parameter for a neutron-rich nucleus;however,it is difficult to determine.In the framework of the Lanzhou Quantum Molecular Dynamics(LQMD)model,a possible probe for the neutron-skin thickness(δ_(np))of neutron-rich ^(48)Ca was studied in the 140A MeV ^(48)Ca+^(9)Be projectile fragmentation reaction based on the parallel momentum distribution(p∥)of the residual fragments.A Fermi-type density distribution was employed to initiate the neutron density distributions in the LQMD simulations.A combined Gaussian function with different width parameters for the left side(Γ_(L))and the right side(Γ_(R))in the distribution was used to describe the p∥of the residual fragments.Taking neutron-rich sulfur isotopes as examples,Γ_(L) shows a sensitive correlation withδ_(np) of ^(48)Ca,and is proposed as a probe for determining the neutron skin thickness of the projectile nucleus.展开更多
This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results cor...This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results correct and speedup the simulation, the scheme for efficient PAPSoCS is proposed and the virtual topology star is constructed to match the path of message passing for solving algorithm-architecture adequation problem. Under the circumstances that messages frequently passed inter-processor are much shorter, typically within several 4 bytes, asynchronous communication mode is employed to reduce the communication ratio. Experiment results show that asynchronous parallel simulation has much higher efficiency than its synchronous counterpart.展开更多
Based on the full domain partition, a parallel finite element algorithm for the stationary Stokes equations is proposed and analyzed. In this algorithm, each subproblem is defined in the entire domain. Majority of the...Based on the full domain partition, a parallel finite element algorithm for the stationary Stokes equations is proposed and analyzed. In this algorithm, each subproblem is defined in the entire domain. Majority of the degrees of freedom are associated with the relevant subdomain. Therefore, it can be solved in parallel with other subproblems using an existing sequential solver without extensive recoding. This allows the algorithm to be implemented easily with low communication costs. Numerical results are given showing the high efficiency of the parallel algorithm.展开更多
In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and ...In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.展开更多
Owing to a certain angle existing between a belt conveyor and the parallel hoppers,and the hoppers localizing away from the centerlines of a blast furnace,particles size segregation is likely to happen in a bell-less ...Owing to a certain angle existing between a belt conveyor and the parallel hoppers,and the hoppers localizing away from the centerlines of a blast furnace,particles size segregation is likely to happen in a bell-less top blast furnace with parallel hoppers.Mastering the law of particles size segregation in hoppers could help to choose better charging parameters and optimize production and technical indices.As for the previous works on burden segregation at a bell-less top blast furnace with parallel hoppers,more attention was paid to the falling point segregation and the circumferential mass flow segregation while charging from the tilting chute,but ignoring the particle size segregation in burden hoppers as burden falls from a belt conveyor,which is the right basis of analyzing the former,and plays a significant role in controlling the gas distribution in the blast furnace.The present work takes ternary mixtures of coke in three different particle sizes to simulate the size segregation of the coke charged into the hoppers by experiments.The effect of the main striking point on size segregation is also investigated.The research shows that there exists a good linear relation between segregation coefficient k and the dimensionless main striking point when using the equation C = C_0~k to express the degree of size segregation in hoppers.The linear relation is proposed for the first time and provides a new way to predict the size segregation in hoppers,which forms a theoretical basis and technical support for reducing the size segregation degree in hoppers.展开更多
The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered de...The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.展开更多
Using an experimental setup, the series configurations (SC) and the parallel configurations (PC) of the PV cell connection are studied to compare their performance under the condition of partial shading s. The perform...Using an experimental setup, the series configurations (SC) and the parallel configurations (PC) of the PV cell connection are studied to compare their performance under the condition of partial shading s. The performance of the configurations is evaluated by comparing the open-circuit voltage, the short-circuit current, the maximum power point (MPP), the voltage and current corresponding to MPP, and the Fill Factor (FF). The variations of the series resistance and the shunt resistance of a PV module under different irradiance levels are also determined by considering the effect of thermal voltage. Finally, a comparison between the performance losses in the different configurations is presented. The results of this study show that the parallel configuration has the best performance under the conditions of partial shade in the context of this work.展开更多
This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study ...The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.展开更多
基金This work was supported by the National Grand Fundamental Research"973"Programof China (No.2004CB719401)
文摘Objective To reduce the execution time of neural network training. Methods Parallel particle swarm optimization algorithm based on master-slave model is proposed to train radial basis function neural networks, which is implemented on a cluster using MPI libraries for inter-process communication. Results High speed-up factor is achieved and execution time is reduced greatly. On the other hand, the resulting neural network has good classification accuracy not only on training sets but also on test sets. Conclusion Since the fitness evaluation is intensive, parallel particle swarm optimization shows great advantages to speed up neural network training.
基金Large research project(RGP2/159/45)supported by the Deanship of Research and Graduate Studies at King Khalid University,Saudi Arabia。
文摘The heat transfer between two corresponding plates,disks,and concentric pipes has many applications,including water cleansing and lubrication.Furthermore,TiO_(2)-water-based nanofluids are used widely because it is useful for operating and controlling the temperature,especially in photovoltaic technology and solar panels.Motivated by these applications,the current study is based on the nanoparticle aggregation effect on magnetohydrodynamics(MHD)flow via rotating parallel plates with the chemical reaction.To achieve maximum heat transportation,the Bruggeman model is used to adapt the Maxwell model.Also,melting and thermal radiation effects are considered in the modeling to discuss heat transport.The Runge-Kutta-Fehlberg 4th−5th order method is used to attain numerical solutions.The main focus of this study is to see the thermodynamic behavior considering several aspects of nanoparticle aggregation.The heat transfer rate between the parallel plates is enhanced by improving the thermophoresis,radiation,and Brownian motion parameters.The rise in Schmidt number and chemical reaction rate parameter decreases the concentration distribution.This study will be helpful in enhancing the thermal efficiency of photovoltaic technology in solar plates,water purifying,thermal management of electronic devices,designing effective cooling systems,and other sustainable technologies.
基金supported by the Discovery Grant (No. RGPIN2018-05991)Discovery Accelerate Supplement Grant (No. RGPAS-2018-522709) of Natural Sciences and Engineering Research Council of CanadaGuangdong Basic and Applied Basic Research Foundation (No. 2019A1515111056)。
文摘This paper studies the libration and stabilization of a parallel partial space elevator system in circular orbits. The system is made up of two paralleled partial space elevators, each of which consists of one main satellite, one end body and a climber moving along the tether between them.The libration characteristics of the elevator are studied through numerical analysis by a new dynamic model, and a novel control strategy is proposed to stabilize the swing of the end body by projecting the climber speeds only. Optimal control method is used to implement the new control strategy in the case where the climbers move in opposite direction. The simulation results validate the effectiveness of the proposed control strategy whose application will neither sacrifice the transport efficiency nor exacerbate libration significantly.
基金Funded by the National 863 Program of China (No. 2005AA113150), and the National Natural Science Foundation of China (No.40701158).
文摘A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.
基金the financial support provided by the National Key Research and Development Program of China(Grant No.2016YFC0800200)the National Natural Science Foundation of China(Grant Nos.51578494 and 51778568)the Fundamental Research Funds for the Central Universities(Grant No.2019QNA4043).
文摘As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2.
基金Supported by the National Natural Science Foundation of China (Nos. 40406005,41076001,40440420596)
文摘We enhance a robust parallel finite element model for coasts and estuaries cases with the use of N-Best refinement algorithms,in multilevel partitioning scheme.Graph partitioning is an important step to construct the parallel model,in which computation speed is a big concern.The partitioning strategy includes the division of the research domain into several semi-equal-sized sub-domains,minimizing the sum weight of edges between different sub-domains.Multilevel schemes for graph partitioning are divided into three phases:coarsening,partitioning,and uncoarsening.In the uncoarsening phase,many refinement algorithms have been proposed previously,such as KL,Greedy,and Boundary refinements.In this study,we propose an N-Best refinement algorithm and show its advantages in our case study of Xiamen Bay.Compared with original partitioning algorithm in previous models,the N-Best algorithm can speed up the computation by 1.9 times,and the simulation results are in a good match with the in-situ data.
文摘A new formulation for tracking multiple particles in slow viscous flow for microfluidic applications is presented.The method employs the manipulation of the boundary element matrices so that finally a system of equations is obtained relating the rigid body velocities of the particle to the forces applied on the particle.The formulation is specially designed for particle trajectory tracking and involves successive matrix multiplications for which SMP(Symmetric multiprocessing)parallelisation is applied.It is observed that present formulation offers an efficient numerical model to be used for particle tracking and can easily be extended for multiphysics simulations in which several physics involved.
基金Project(61372136) supported by the National Natural Science Foundation of China
文摘The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.
基金This work was supported by the National Key Research and Development Program of China[Grant No.2016YFC0800200]the National Natural Science Foundation of China[Grant Nos.51778568,51908492,and 52008366]+1 种基金Zhejiang Provincial Natural Science Foundation of China[Grant Nos.LQ21E080019 and LY21E080022]This work was also sup-ported by the Key Laboratory of Space Structures of Zhejiang Province(Zhejiang University)and the Center for Balance Architecture of Zhejiang University.
文摘Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method.
基金supported by the National Natural Science Foundation of China(11273017)
文摘A new recursive algorithm with the partial parallel structure based on the linearly constrained minimum variance (LCMV) criterion for adaptive monopulse systems is proposed. The weight vector associated with the original whole antenna array is decomposed into several adaptive weight sub-vectors firstly. An adaptive algorithm based on the conventional LCMV principle is then deduced to update the weight sub-vectors for sum and difference beam, respectively. The optimal weight vector can be obtained after convergence. The required computational complexity is evaluated for the proposed technique, which is on the order of O(N) and less than that of the conventional LCMV method. The flow chart scheme with the partial parallel structure of the proposed algorithm is introduced. This scheme is easy to be implemented on a distributed computer/digital signal processor (DSP) system to solve the problems of the heavy computational burden and vast data transmission of the large-scale adaptive monopulse array. Then, the monopulse ratio and convergence rate of the proposed algorithm are evaluated by numerical simulations. Compared with some recent adaptive monopulse estimation methods, a better performance on computational complexity and monopulse ratio can be achieved with the proposed adaptive method.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.12072217 and 42077254)the Natural Science Foundation of Hunan Province,China(Grant No.2022JJ30567).
文摘The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework.
基金the National Natural Science Foundation of China(Nos.12375123,11975091,and 12305130)the Natural Science Foundation of Henan Province(No.242300421048)+1 种基金China Postdoctoral Science Foundation(No.2023M731016)Henan Postdoctoral Foundation(No.HN2022164).
文摘Neutron-skin thickness is a key parameter for a neutron-rich nucleus;however,it is difficult to determine.In the framework of the Lanzhou Quantum Molecular Dynamics(LQMD)model,a possible probe for the neutron-skin thickness(δ_(np))of neutron-rich ^(48)Ca was studied in the 140A MeV ^(48)Ca+^(9)Be projectile fragmentation reaction based on the parallel momentum distribution(p∥)of the residual fragments.A Fermi-type density distribution was employed to initiate the neutron density distributions in the LQMD simulations.A combined Gaussian function with different width parameters for the left side(Γ_(L))and the right side(Γ_(R))in the distribution was used to describe the p∥of the residual fragments.Taking neutron-rich sulfur isotopes as examples,Γ_(L) shows a sensitive correlation withδ_(np) of ^(48)Ca,and is proposed as a probe for determining the neutron skin thickness of the projectile nucleus.
文摘This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results correct and speedup the simulation, the scheme for efficient PAPSoCS is proposed and the virtual topology star is constructed to match the path of message passing for solving algorithm-architecture adequation problem. Under the circumstances that messages frequently passed inter-processor are much shorter, typically within several 4 bytes, asynchronous communication mode is employed to reduce the communication ratio. Experiment results show that asynchronous parallel simulation has much higher efficiency than its synchronous counterpart.
基金Project supported by the National Natural Science Foundation of China (No.10971166)the National Basic Research Program (No.2005CB321703)the Science and Technology Foundation of Guizhou Province of China (No.[2008]2123)
文摘Based on the full domain partition, a parallel finite element algorithm for the stationary Stokes equations is proposed and analyzed. In this algorithm, each subproblem is defined in the entire domain. Majority of the degrees of freedom are associated with the relevant subdomain. Therefore, it can be solved in parallel with other subproblems using an existing sequential solver without extensive recoding. This allows the algorithm to be implemented easily with low communication costs. Numerical results are given showing the high efficiency of the parallel algorithm.
基金Sponsored by the National Natural Science Foundation of China( Grant No. 61032003)the Fundamental Research Funds for the Central Universities( Grant No. HIT. NSRIF.2012021)
文摘In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.
文摘Owing to a certain angle existing between a belt conveyor and the parallel hoppers,and the hoppers localizing away from the centerlines of a blast furnace,particles size segregation is likely to happen in a bell-less top blast furnace with parallel hoppers.Mastering the law of particles size segregation in hoppers could help to choose better charging parameters and optimize production and technical indices.As for the previous works on burden segregation at a bell-less top blast furnace with parallel hoppers,more attention was paid to the falling point segregation and the circumferential mass flow segregation while charging from the tilting chute,but ignoring the particle size segregation in burden hoppers as burden falls from a belt conveyor,which is the right basis of analyzing the former,and plays a significant role in controlling the gas distribution in the blast furnace.The present work takes ternary mixtures of coke in three different particle sizes to simulate the size segregation of the coke charged into the hoppers by experiments.The effect of the main striking point on size segregation is also investigated.The research shows that there exists a good linear relation between segregation coefficient k and the dimensionless main striking point when using the equation C = C_0~k to express the degree of size segregation in hoppers.The linear relation is proposed for the first time and provides a new way to predict the size segregation in hoppers,which forms a theoretical basis and technical support for reducing the size segregation degree in hoppers.
文摘The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.
文摘Using an experimental setup, the series configurations (SC) and the parallel configurations (PC) of the PV cell connection are studied to compare their performance under the condition of partial shading s. The performance of the configurations is evaluated by comparing the open-circuit voltage, the short-circuit current, the maximum power point (MPP), the voltage and current corresponding to MPP, and the Fill Factor (FF). The variations of the series resistance and the shunt resistance of a PV module under different irradiance levels are also determined by considering the effect of thermal voltage. Finally, a comparison between the performance losses in the different configurations is presented. The results of this study show that the parallel configuration has the best performance under the conditions of partial shade in the context of this work.
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金Supported by Key Scientific Research Platforms and Projects of Guangdong Regular Institutions of Higher Education of China(Grant No.2022KCXTD033)Guangdong Provincial Natural Science Foundation of China(Grant No.2023A1515012103)+1 种基金Guangdong Provincial Scientific Research Capacity Improvement Project of Key Developing Disciplines of China(Grant No.2021ZDJS084)National Natural Science Foundation of China(Grant No.52105009).
文摘The current parallel ankle rehabilitation robot(ARR)suffers from the problem of difficult real-time alignment of the human-robot joint center of rotation,which may lead to secondary injuries to the patient.This study investigates type synthesis of a parallel self-alignment ankle rehabilitation robot(PSAARR)based on the kinematic characteristics of ankle joint rotation center drift from the perspective of introducing"suitable passive degrees of freedom(DOF)"with a suitable number and form.First,the self-alignment principle of parallel ARR was proposed by deriving conditions for transforming a human-robot closed chain(HRCC)formed by an ARR and human body into a kinematic suitable constrained system and introducing conditions of"decoupled"and"less limb".Second,the relationship between the self-alignment principle and actuation wrenches(twists)of PSAARR was analyzed with the velocity Jacobian matrix as a"bridge".Subsequently,the type synthesis conditions of PSAARR were proposed.Third,a PSAARR synthesis method was proposed based on the screw theory and type of PSAARR synthesis conducted.Finally,an HRCC kinematic model was established to verify the self-alignment capability of the PSAARR.In this study,93 types of PSAARR limb structures were synthesized and the self-alignment capability of a human-robot joint axis was verified through kinematic analysis,which provides a theoretical basis for the design of such an ARR.