The Southern Ocean is an important carbon sink pool and plays a critical role in the global carbon cycling.The Amundsen Sea was reported to be highly productive in inshore area in the Southern Ocean.In order to invest...The Southern Ocean is an important carbon sink pool and plays a critical role in the global carbon cycling.The Amundsen Sea was reported to be highly productive in inshore area in the Southern Ocean.In order to investigate the influence of transparent exopolymer particles(TEP)on the behavior of dissolved organic carbon(DOC)in this region,a comprehensive study was conducted,encompassing both open water areas and highly productive polynyas.It was found that microbial heterotrophic metabolism is the primary process responsible for the production of humic-like fluorescent components in the open ocean.The relationship between apparent oxygen utilization and the two humic-like components can be accurately described by a power-law function,with a conversion rate consistent with that observed globally.The presence of TEP was found to have little impact on this process.Additionally,the study revealed the accumulation of DOC at the sea surface in the Amundsen Sea Polynya,suggesting that TEP may play a critical role in this phenomenon.These findings contribute to a deeper understanding of the dynamics and surface accumulation of DOC in the Amundsen Sea Polynya,and provide valuable insights into the carbon cycle in this region.展开更多
As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle d...As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2.展开更多
A new formulation for tracking multiple particles in slow viscous flow for microfluidic applications is presented.The method employs the manipulation of the boundary element matrices so that finally a system of equati...A new formulation for tracking multiple particles in slow viscous flow for microfluidic applications is presented.The method employs the manipulation of the boundary element matrices so that finally a system of equations is obtained relating the rigid body velocities of the particle to the forces applied on the particle.The formulation is specially designed for particle trajectory tracking and involves successive matrix multiplications for which SMP(Symmetric multiprocessing)parallelisation is applied.It is observed that present formulation offers an efficient numerical model to be used for particle tracking and can easily be extended for multiphysics simulations in which several physics involved.展开更多
The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, wher...The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.展开更多
Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation fr...Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method.展开更多
Objective To reduce the execution time of neural network training. Methods Parallel particle swarm optimization algorithm based on master-slave model is proposed to train radial basis function neural networks, which i...Objective To reduce the execution time of neural network training. Methods Parallel particle swarm optimization algorithm based on master-slave model is proposed to train radial basis function neural networks, which is implemented on a cluster using MPI libraries for inter-process communication. Results High speed-up factor is achieved and execution time is reduced greatly. On the other hand, the resulting neural network has good classification accuracy not only on training sets but also on test sets. Conclusion Since the fitness evaluation is intensive, parallel particle swarm optimization shows great advantages to speed up neural network training.展开更多
Owing to a certain angle existing between a belt conveyor and the parallel hoppers,and the hoppers localizing away from the centerlines of a blast furnace,particles size segregation is likely to happen in a bell-less ...Owing to a certain angle existing between a belt conveyor and the parallel hoppers,and the hoppers localizing away from the centerlines of a blast furnace,particles size segregation is likely to happen in a bell-less top blast furnace with parallel hoppers.Mastering the law of particles size segregation in hoppers could help to choose better charging parameters and optimize production and technical indices.As for the previous works on burden segregation at a bell-less top blast furnace with parallel hoppers,more attention was paid to the falling point segregation and the circumferential mass flow segregation while charging from the tilting chute,but ignoring the particle size segregation in burden hoppers as burden falls from a belt conveyor,which is the right basis of analyzing the former,and plays a significant role in controlling the gas distribution in the blast furnace.The present work takes ternary mixtures of coke in three different particle sizes to simulate the size segregation of the coke charged into the hoppers by experiments.The effect of the main striking point on size segregation is also investigated.The research shows that there exists a good linear relation between segregation coefficient k and the dimensionless main striking point when using the equation C = C_0~k to express the degree of size segregation in hoppers.The linear relation is proposed for the first time and provides a new way to predict the size segregation in hoppers,which forms a theoretical basis and technical support for reducing the size segregation degree in hoppers.展开更多
In recent years,numerical weather forecasting has been increasingly emphasized.Variational data assimilation furnishes precise initial values for numerical forecasting models,constituting an inherently nonlinear optim...In recent years,numerical weather forecasting has been increasingly emphasized.Variational data assimilation furnishes precise initial values for numerical forecasting models,constituting an inherently nonlinear optimization challenge.The enormity of the dataset under consideration gives rise to substantial computational burdens,complex modeling,and high hardware requirements.This paper employs the Dual-Population Particle Swarm Optimization(DPSO)algorithm in variational data assimilation to enhance assimilation accuracy.By harnessing parallel computing principles,the paper introduces the Parallel Dual-Population Particle Swarm Optimization(PDPSO)Algorithm to reduce the algorithm processing time.Simulations were carried out using partial differential equations,and comparisons in terms of time and accuracy were made against DPSO,the Dynamic Weight Particle Swarm Algorithm(PSOCIWAC),and the TimeVarying Double Compression Factor Particle Swarm Algorithm(PSOTVCF).Experimental results indicate that the proposed PDPSO outperforms PSOCIWAC and PSOTVCF in convergence accuracy and is comparable to DPSO.Regarding processing time,PDPSO is 40%faster than PSOCIWAC and PSOTVCF and 70%faster than DPSO.展开更多
This study presents a calibration process of three-dimensional particle flow code(PFC3D)simulation of intact and fissured granite samples.First,laboratory stressestrain response from triaxial testing of intact and fis...This study presents a calibration process of three-dimensional particle flow code(PFC3D)simulation of intact and fissured granite samples.First,laboratory stressestrain response from triaxial testing of intact and fissured granite samples is recalled.Then,PFC3D is introduced,with focus on the bonded particle models(BPM).After that,we present previous studies where intact rock is simulated by means of flatjoint approaches,and how improved accuracy was gained with the help of parametric studies.Then,models of the pre-fissured rock specimens were generated,including modeled fissures in the form of“smooth joint”type contacts.Finally,triaxial testing simulations of 1 t 2 and 2 t 3 jointed rock specimens were performed.Results show that both elastic behavior and the peak strength levels are closely matched,without any additional fine tuning of micro-mechanical parameters.Concerning the postfailure behavior,models reproduce the trends of decreasing dilation with increasing confinement and plasticity.However,the dilation values simulated are larger than those observed in practice.This is attributed to the difficulty in modeling some phenomena of fissured rock behaviors,such as rock piece corner crushing with dust production and interactions between newly formed shear bands or axial splitting cracks with pre-existing joints.展开更多
The particle filter(PF) algorithm is one of the most commonly used algorithms for maneuvering target tracking. The traditional PF maps from multi-dimensional information to onedimensional information during particle...The particle filter(PF) algorithm is one of the most commonly used algorithms for maneuvering target tracking. The traditional PF maps from multi-dimensional information to onedimensional information during particle weight calculation, and the incorrect transmission of information leads to the fact that the particle prediction information does not match the weight information, and its essence is the reduction of the information entropy of the useful information. To solve this problem, a dual channel independent filtering method is proposed based on the idea of equalization mapping. Firstly, the particle prediction performance is described by particle manipulations of different dimensions, and the accuracy of particle prediction is improved. The improvement of particle degradation of this algorithm is analyzed in the aspects of particle weight and effective particle number. Secondly, according to the problem of lack of particle samples, the new particles are generated based on the filtering results, and the particle diversity is increased. Finally, the introduction of the graphics processing unit(GPU) parallel computing the platform, the “channel-level” and “particlelevel” parallel computing the program are designed to accelerate the algorithm. The simulation results show that the algorithm has the advantages of better filtering precision, higher particle efficiency and faster calculation speed compared with the traditional algorithm of the CPU platform.展开更多
Particle accelerators play an important role in a wide range of scientific discoveries and industrial applications. The self-consistent multi-particle simulation based on the particle-in-cell (PIC) method has been use...Particle accelerators play an important role in a wide range of scientific discoveries and industrial applications. The self-consistent multi-particle simulation based on the particle-in-cell (PIC) method has been used to study charged particle beam dynamics inside those accelerators. However, the PIC simulation is time-consuming and needs to use modern parallel computers for high-resolution applications. In this paper, we implemented a parallel beam dynamics PIC code on multi-node hybrid architecture computers with multiple Graphics Processing Units (GPUs). We used two methods to parallelize the PIC code on multiple GPUs and observed that the replication method is a better choice for moderate problem size and current computer hardware while the domain decomposition method might be a better choice for large problem size and more advanced computer hardware that allows direct communications among multiple GPUs. Using the multi-node hybrid architectures at Oak Ridge Leadership Computing Facility (OLCF), the optimized GPU PIC code achieves a reasonable parallel performance and scales up to 64 GPUs with 16 million particles.展开更多
For flat fast fading Multiple-Input Multiple-Output(MIMO) channels,this paper presents a sampling based channel estimation and an iterative Particle Filter(PF) signal detection scheme. The channel estimation is compri...For flat fast fading Multiple-Input Multiple-Output(MIMO) channels,this paper presents a sampling based channel estimation and an iterative Particle Filter(PF) signal detection scheme. The channel estimation is comprised of two parts:the adaptive iterative update on the channel distribution mean and a regular update on the "adaptability" via pilot. In the detection procedure,the PF is employed to produce the optimal decision given the known received signal and the sequence of the channel samples,where an asymptotic optimal importance density is constructed,and in terms of the asymptotic update order,the Parallel Importance Update(PIU) and the Serial Importance Update(SIU) scheme are performed respectively. The simulation results show that for the given fading channel,if an appropriate pilot mode is selected,the proposed scheme is more robust than the conventional Kalman filter based superimposed detection scheme.展开更多
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .展开更多
基金funded by the National Natural Science Foundation of China(Grant nos.42276255 and 41976227)project“Impact and Response of Antarctic Seas to Climate Change,IRASCC 2020-2022”(Grant nos.01-01-02A and 02-02-05).
文摘The Southern Ocean is an important carbon sink pool and plays a critical role in the global carbon cycling.The Amundsen Sea was reported to be highly productive in inshore area in the Southern Ocean.In order to investigate the influence of transparent exopolymer particles(TEP)on the behavior of dissolved organic carbon(DOC)in this region,a comprehensive study was conducted,encompassing both open water areas and highly productive polynyas.It was found that microbial heterotrophic metabolism is the primary process responsible for the production of humic-like fluorescent components in the open ocean.The relationship between apparent oxygen utilization and the two humic-like components can be accurately described by a power-law function,with a conversion rate consistent with that observed globally.The presence of TEP was found to have little impact on this process.Additionally,the study revealed the accumulation of DOC at the sea surface in the Amundsen Sea Polynya,suggesting that TEP may play a critical role in this phenomenon.These findings contribute to a deeper understanding of the dynamics and surface accumulation of DOC in the Amundsen Sea Polynya,and provide valuable insights into the carbon cycle in this region.
基金the financial support provided by the National Key Research and Development Program of China(Grant No.2016YFC0800200)the National Natural Science Foundation of China(Grant Nos.51578494 and 51778568)the Fundamental Research Funds for the Central Universities(Grant No.2019QNA4043).
文摘As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2.
文摘A new formulation for tracking multiple particles in slow viscous flow for microfluidic applications is presented.The method employs the manipulation of the boundary element matrices so that finally a system of equations is obtained relating the rigid body velocities of the particle to the forces applied on the particle.The formulation is specially designed for particle trajectory tracking and involves successive matrix multiplications for which SMP(Symmetric multiprocessing)parallelisation is applied.It is observed that present formulation offers an efficient numerical model to be used for particle tracking and can easily be extended for multiphysics simulations in which several physics involved.
基金Project(61372136) supported by the National Natural Science Foundation of China
文摘The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.
基金This work was supported by the National Key Research and Development Program of China[Grant No.2016YFC0800200]the National Natural Science Foundation of China[Grant Nos.51778568,51908492,and 52008366]+1 种基金Zhejiang Provincial Natural Science Foundation of China[Grant Nos.LQ21E080019 and LY21E080022]This work was also sup-ported by the Key Laboratory of Space Structures of Zhejiang Province(Zhejiang University)and the Center for Balance Architecture of Zhejiang University.
文摘Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method.
基金This work was supported by the National Grand Fundamental Research"973"Programof China (No.2004CB719401)
文摘Objective To reduce the execution time of neural network training. Methods Parallel particle swarm optimization algorithm based on master-slave model is proposed to train radial basis function neural networks, which is implemented on a cluster using MPI libraries for inter-process communication. Results High speed-up factor is achieved and execution time is reduced greatly. On the other hand, the resulting neural network has good classification accuracy not only on training sets but also on test sets. Conclusion Since the fitness evaluation is intensive, parallel particle swarm optimization shows great advantages to speed up neural network training.
文摘Owing to a certain angle existing between a belt conveyor and the parallel hoppers,and the hoppers localizing away from the centerlines of a blast furnace,particles size segregation is likely to happen in a bell-less top blast furnace with parallel hoppers.Mastering the law of particles size segregation in hoppers could help to choose better charging parameters and optimize production and technical indices.As for the previous works on burden segregation at a bell-less top blast furnace with parallel hoppers,more attention was paid to the falling point segregation and the circumferential mass flow segregation while charging from the tilting chute,but ignoring the particle size segregation in burden hoppers as burden falls from a belt conveyor,which is the right basis of analyzing the former,and plays a significant role in controlling the gas distribution in the blast furnace.The present work takes ternary mixtures of coke in three different particle sizes to simulate the size segregation of the coke charged into the hoppers by experiments.The effect of the main striking point on size segregation is also investigated.The research shows that there exists a good linear relation between segregation coefficient k and the dimensionless main striking point when using the equation C = C_0~k to express the degree of size segregation in hoppers.The linear relation is proposed for the first time and provides a new way to predict the size segregation in hoppers,which forms a theoretical basis and technical support for reducing the size segregation degree in hoppers.
基金Supported by Hubei Provincial Department of Education Teaching Research Project(2016294,2017320)Hubei Provincial Humanities and Social Science Research Project(17D033)+2 种基金College Students Innovation and Entrepreneurship Training Program(National)(20191050013)Hubei Province Natural Science Foundation General Project(2021CFB584)2023 College Student Innovation and Entrepreneurship Training Program Project(202310500047,202310500049)。
文摘In recent years,numerical weather forecasting has been increasingly emphasized.Variational data assimilation furnishes precise initial values for numerical forecasting models,constituting an inherently nonlinear optimization challenge.The enormity of the dataset under consideration gives rise to substantial computational burdens,complex modeling,and high hardware requirements.This paper employs the Dual-Population Particle Swarm Optimization(DPSO)algorithm in variational data assimilation to enhance assimilation accuracy.By harnessing parallel computing principles,the paper introduces the Parallel Dual-Population Particle Swarm Optimization(PDPSO)Algorithm to reduce the algorithm processing time.Simulations were carried out using partial differential equations,and comparisons in terms of time and accuracy were made against DPSO,the Dynamic Weight Particle Swarm Algorithm(PSOCIWAC),and the TimeVarying Double Compression Factor Particle Swarm Algorithm(PSOTVCF).Experimental results indicate that the proposed PDPSO outperforms PSOCIWAC and PSOTVCF in convergence accuracy and is comparable to DPSO.Regarding processing time,PDPSO is 40%faster than PSOCIWAC and PSOTVCF and 70%faster than DPSO.
基金The University of Vigo is acknowledged for financing part of the first author’s PhD studiesthe Spanish Ministry of Economy and Competitiveness for funding of the project‘Deepening on the behaviour of rock masses:Scale effects on the stressestrain response of fissured rock samples with particular emphasis on post-failure’,awarded under Contract Reference No.RTI2018-093563-B-I00partially financed by means of European Regional Development Funds from the European Union(EU)。
文摘This study presents a calibration process of three-dimensional particle flow code(PFC3D)simulation of intact and fissured granite samples.First,laboratory stressestrain response from triaxial testing of intact and fissured granite samples is recalled.Then,PFC3D is introduced,with focus on the bonded particle models(BPM).After that,we present previous studies where intact rock is simulated by means of flatjoint approaches,and how improved accuracy was gained with the help of parametric studies.Then,models of the pre-fissured rock specimens were generated,including modeled fissures in the form of“smooth joint”type contacts.Finally,triaxial testing simulations of 1 t 2 and 2 t 3 jointed rock specimens were performed.Results show that both elastic behavior and the peak strength levels are closely matched,without any additional fine tuning of micro-mechanical parameters.Concerning the postfailure behavior,models reproduce the trends of decreasing dilation with increasing confinement and plasticity.However,the dilation values simulated are larger than those observed in practice.This is attributed to the difficulty in modeling some phenomena of fissured rock behaviors,such as rock piece corner crushing with dust production and interactions between newly formed shear bands or axial splitting cracks with pre-existing joints.
基金supported by the National High-tech R&D Program of China(2015AA70560452015AA8017032P)the National Natural Science Foundation of China(61401504)
文摘The particle filter(PF) algorithm is one of the most commonly used algorithms for maneuvering target tracking. The traditional PF maps from multi-dimensional information to onedimensional information during particle weight calculation, and the incorrect transmission of information leads to the fact that the particle prediction information does not match the weight information, and its essence is the reduction of the information entropy of the useful information. To solve this problem, a dual channel independent filtering method is proposed based on the idea of equalization mapping. Firstly, the particle prediction performance is described by particle manipulations of different dimensions, and the accuracy of particle prediction is improved. The improvement of particle degradation of this algorithm is analyzed in the aspects of particle weight and effective particle number. Secondly, according to the problem of lack of particle samples, the new particles are generated based on the filtering results, and the particle diversity is increased. Finally, the introduction of the graphics processing unit(GPU) parallel computing the platform, the “channel-level” and “particlelevel” parallel computing the program are designed to accelerate the algorithm. The simulation results show that the algorithm has the advantages of better filtering precision, higher particle efficiency and faster calculation speed compared with the traditional algorithm of the CPU platform.
文摘Particle accelerators play an important role in a wide range of scientific discoveries and industrial applications. The self-consistent multi-particle simulation based on the particle-in-cell (PIC) method has been used to study charged particle beam dynamics inside those accelerators. However, the PIC simulation is time-consuming and needs to use modern parallel computers for high-resolution applications. In this paper, we implemented a parallel beam dynamics PIC code on multi-node hybrid architecture computers with multiple Graphics Processing Units (GPUs). We used two methods to parallelize the PIC code on multiple GPUs and observed that the replication method is a better choice for moderate problem size and current computer hardware while the domain decomposition method might be a better choice for large problem size and more advanced computer hardware that allows direct communications among multiple GPUs. Using the multi-node hybrid architectures at Oak Ridge Leadership Computing Facility (OLCF), the optimized GPU PIC code achieves a reasonable parallel performance and scales up to 64 GPUs with 16 million particles.
基金the National Natural Science Foundation of China (No. 60672047)Shanghai Postdoctoral Scientific Program (No. 05R214110).
文摘For flat fast fading Multiple-Input Multiple-Output(MIMO) channels,this paper presents a sampling based channel estimation and an iterative Particle Filter(PF) signal detection scheme. The channel estimation is comprised of two parts:the adaptive iterative update on the channel distribution mean and a regular update on the "adaptability" via pilot. In the detection procedure,the PF is employed to produce the optimal decision given the known received signal and the sequence of the channel samples,where an asymptotic optimal importance density is constructed,and in terms of the asymptotic update order,the Parallel Importance Update(PIU) and the Serial Importance Update(SIU) scheme are performed respectively. The simulation results show that for the given fading channel,if an appropriate pilot mode is selected,the proposed scheme is more robust than the conventional Kalman filter based superimposed detection scheme.
文摘Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .