Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method...Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.展开更多
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t...A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature.展开更多
Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to empl...Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to employ efficient and fast image encryption techniques.While 1D chaotic maps offer a practical approach to real-time image encryption,their limited flexibility and increased vulnerability restrict their practical application.In this research,we have utilized a 3DHindmarsh-Rosemodel to construct a secure cryptosystem.The randomness of the chaotic map is assessed through standard analysis.The proposed system enhances security by incorporating an increased number of system parameters and a wide range of chaotic parameters,as well as ensuring a uniformdistribution of chaotic signals across the entire value space.Additionally,a fast image encryption technique utilizing the new chaotic system is proposed.The novelty of the approach is confirmed through time complexity analysis.To further strengthen the resistance against cryptanalysis attacks and differential attacks,the SHA-256 algorithm is employed for secure key generation.Experimental results through a number of parameters demonstrate the strong cryptographic performance of the proposed image encryption approach,highlighting its exceptional suitability for secure communication.Moreover,the security of the proposed scheme has been compared with stateof-the-art image encryption schemes,and all comparison metrics indicate the superior performance of the proposed scheme.展开更多
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .展开更多
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V...The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.展开更多
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present...The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.展开更多
The vertex solution for estimation on the static displacement bounds of structures with uncertain-but-bounded parameters is studied in this paper. For the linear static problem, when there are uncertain interval param...The vertex solution for estimation on the static displacement bounds of structures with uncertain-but-bounded parameters is studied in this paper. For the linear static problem, when there are uncertain interval parameters in the stiffness matrix and the vector of applied forces, the static response may be an interval. Based on the interval operations, the interval solution obtained by the vertex solution is more accurate and more credible than other methods (such as the perturbation method). However, the vertex solution method by traditional serial computing usually needs large computational efforts, especially for large structures. In order to avoid its disadvantages of large calculation and much runtime, its parallel computing which can be used in large-scale computing is presented in this paper. Two kinds of parallel computing algorithms are proposed based on the vertex solution. The parallel computing will solve many interval problems which cannot be resolved by traditional interval analysis methods.展开更多
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a paral...In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.展开更多
In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based ...In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing.展开更多
The Global Positioning System (GPS) ray-shooting model is a self-sufficient observation operator in GPS/ MET (Meteorology) data variational assimilation linking up the GPS observation data and the atmospheric state va...The Global Positioning System (GPS) ray-shooting model is a self-sufficient observation operator in GPS/ MET (Meteorology) data variational assimilation linking up the GPS observation data and the atmospheric state variables. But its huge computations make it impracticable in real data assimilation so far. In order to overcome this default, a parallel version of the GPS ray-shooting model has been developed, and has been running successfully on the PC cluster manufactured under the support of the China National Key Development Planning Project for Basic Research: The Large Scale Scientific Computation Research. High speed-up and Efficiency as well as good scalability are obtained. This is an important step for this GPS observation operator to become practicable. Key words GPS ray-shooting - Parallel computing - Efficiency - Scalability This research was supported by the National Natural Science Foundation of China(Grant No. 49825109), the National Key Development Planning Project for Basic Research (Grant No. 1999032801) and the CAS Key Innovation Direction Project (Grant No.KZCX2208).展开更多
This paper discusses the parallel computing of the third generation Ocean General Circulation Model (OGCM) from the State Key Laboratory of Numerical Modeling for Atmospheric Science and Geophysical Fluid Dynamics(LAS...This paper discusses the parallel computing of the third generation Ocean General Circulation Model (OGCM) from the State Key Laboratory of Numerical Modeling for Atmospheric Science and Geophysical Fluid Dynamics(LASG),Institute of Atmosphere Physics(IAP). Meanwhile, several optimization strategies for parallel computing of OGCM (POGCM) on Scalable Shared Memory Multiprocessor (S2MP) are presented. Using Message Passing Interface (MPI), we obtain super linear speedup on SGI Origin 2000 for parallel OGCM(POGCM) after optimization.展开更多
Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjuga...Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjugate gradient method based on domain decomposition. Using the developed code, a dam structural analysis problem is solved on workstation cluster and results are given. The parallel performance is analyzed.展开更多
Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simulta...Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.展开更多
Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain d...Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain decomposition of grids, creation of virtual diagonal bordered matrix, assembling of boundary matrix, parallel LDL^T decomposition, parallel solving of Poisson Equation, parallel estimation of convergence and so on. The parallel computing method can solve the problems that are difficult to solve using traditional serial computing. Furthermore, existing microcomputers can be fully used to resolve some large-scale problems of complex turbulent flow.展开更多
The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research et...The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research etc. Many applications in these areas need super computing power. The traditional mode of sequential processing cannot meet the demands of those computations, thus, parallel processing(PP) is the main way of high performance computing (HPC) now.展开更多
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th...As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.展开更多
For CFD results to be useful in IC engine analysis, simulation results should be accurate and consistent. However, with wide spread use of parallel computing nowadays, it has been reported that a model would not give ...For CFD results to be useful in IC engine analysis, simulation results should be accurate and consistent. However, with wide spread use of parallel computing nowadays, it has been reported that a model would not give the same results against the same input when the parallel computing environment is changed. The effect of parallel environment on simulation results needs to be carefully investigated and understood. In this paper, the solution inconsistency of parallel CFD simulations is investigated. First, the concept of solution inconsistency on parallel computing is reviewed, followed by a systematic CFD simulations specific to IC engine applications. The solution inconsistency against the number of CPU cores was examined using a commercial CFD code CONVERGE. A test matrix was specifically designed to examine the core number effect on engine flow, spray and combustion submodels performance. It was found that the flow field simulation during the gas exchange process is the most sensitive to the number of cores among all submodels examined. An engineering solution was developed where local upwind scheme was used to control the variability, which showed good performance. The implication of the observed inconsistency was also discussed.展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
In the Windows XP 64 bit operating system environment, several common PC were used to build a cluster system, establishing the distributed memory parallel (DMP) computing system. A finite element model of whole aircra...In the Windows XP 64 bit operating system environment, several common PC were used to build a cluster system, establishing the distributed memory parallel (DMP) computing system. A finite element model of whole aircraft with about 260 million degrees of freedom (DOF) was developed using three-node and four-node thin shell element and two-node beam element. With the large commercial finite element software MSC.MARC and employing two kinds of domain decomposition method (DDM) respectively, realized the parallel solving for the static strength analysis of the whole aircraft model, which offered a high cost-effective solution for solving large-scale and complex finite element models.展开更多
基金the China Natural Science Fund(No.52171253)the Natural Science Foundation of Sichuan(No.2022NSFSCO949).
文摘Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.
基金supported by the National Natural Science Foundation of China (NSFC)Basic Science Center Program for Multiscale Problems in Nonlinear Mechanics’(Grant No. 11988102)NSFC project (Grant No. 11972038)
文摘A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature.
基金the Deanship of Scientific Research at Najran University for funding this work under the Research Groups Funding Program Grant Code(NU/RG/SERC/12/3).
文摘Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to employ efficient and fast image encryption techniques.While 1D chaotic maps offer a practical approach to real-time image encryption,their limited flexibility and increased vulnerability restrict their practical application.In this research,we have utilized a 3DHindmarsh-Rosemodel to construct a secure cryptosystem.The randomness of the chaotic map is assessed through standard analysis.The proposed system enhances security by incorporating an increased number of system parameters and a wide range of chaotic parameters,as well as ensuring a uniformdistribution of chaotic signals across the entire value space.Additionally,a fast image encryption technique utilizing the new chaotic system is proposed.The novelty of the approach is confirmed through time complexity analysis.To further strengthen the resistance against cryptanalysis attacks and differential attacks,the SHA-256 algorithm is employed for secure key generation.Experimental results through a number of parameters demonstrate the strong cryptographic performance of the proposed image encryption approach,highlighting its exceptional suitability for secure communication.Moreover,the security of the proposed scheme has been compared with stateof-the-art image encryption schemes,and all comparison metrics indicate the superior performance of the proposed scheme.
文摘Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .
基金supported by the National Natural Science Eoundation of China under Grant No.40221503the China National Key Programme for Development Basic Sciences (Abbreviation:973 Project,Grant No.G1999032801)
文摘The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.
基金This project was supported by the National Natural Science Foundation of China (60135020).
文摘The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.
基金supported by the National Outstanding Youth Science Foundation of China (No.10425208)111 Project(No.B07009)FanZhou Science and Research Foundation for Young Scholars (No.20080503).
文摘The vertex solution for estimation on the static displacement bounds of structures with uncertain-but-bounded parameters is studied in this paper. For the linear static problem, when there are uncertain interval parameters in the stiffness matrix and the vector of applied forces, the static response may be an interval. Based on the interval operations, the interval solution obtained by the vertex solution is more accurate and more credible than other methods (such as the perturbation method). However, the vertex solution method by traditional serial computing usually needs large computational efforts, especially for large structures. In order to avoid its disadvantages of large calculation and much runtime, its parallel computing which can be used in large-scale computing is presented in this paper. Two kinds of parallel computing algorithms are proposed based on the vertex solution. The parallel computing will solve many interval problems which cannot be resolved by traditional interval analysis methods.
文摘In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
基金supported by the Natural Science Foundation of Shanghai (Grant No.08ZR1408200)the Shanghai Leading Academic Discipline Project (Grant No.J50103)the Open Project Program of the National Laboratory of Pattern Recognition
文摘In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing.
基金This research was supported by the National Natural Science Foundation of China(Orant No.49825109)the National Key Developme
文摘The Global Positioning System (GPS) ray-shooting model is a self-sufficient observation operator in GPS/ MET (Meteorology) data variational assimilation linking up the GPS observation data and the atmospheric state variables. But its huge computations make it impracticable in real data assimilation so far. In order to overcome this default, a parallel version of the GPS ray-shooting model has been developed, and has been running successfully on the PC cluster manufactured under the support of the China National Key Development Planning Project for Basic Research: The Large Scale Scientific Computation Research. High speed-up and Efficiency as well as good scalability are obtained. This is an important step for this GPS observation operator to become practicable. Key words GPS ray-shooting - Parallel computing - Efficiency - Scalability This research was supported by the National Natural Science Foundation of China(Grant No. 49825109), the National Key Development Planning Project for Basic Research (Grant No. 1999032801) and the CAS Key Innovation Direction Project (Grant No.KZCX2208).
基金Supported by National 86 3 Project(86 3-30 6 -ZD11)
文摘This paper discusses the parallel computing of the third generation Ocean General Circulation Model (OGCM) from the State Key Laboratory of Numerical Modeling for Atmospheric Science and Geophysical Fluid Dynamics(LASG),Institute of Atmosphere Physics(IAP). Meanwhile, several optimization strategies for parallel computing of OGCM (POGCM) on Scalable Shared Memory Multiprocessor (S2MP) are presented. Using Message Passing Interface (MPI), we obtain super linear speedup on SGI Origin 2000 for parallel OGCM(POGCM) after optimization.
基金Project supported by Key Project Science Foundation of ShanghaiMunicipal Commission of Education (Grant No .03AZ03)
文摘Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjugate gradient method based on domain decomposition. Using the developed code, a dam structural analysis problem is solved on workstation cluster and results are given. The parallel performance is analyzed.
基金Supported by Natural Science Foundation of China ( No. 60373061).
文摘Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing.
文摘Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain decomposition of grids, creation of virtual diagonal bordered matrix, assembling of boundary matrix, parallel LDL^T decomposition, parallel solving of Poisson Equation, parallel estimation of convergence and so on. The parallel computing method can solve the problems that are difficult to solve using traditional serial computing. Furthermore, existing microcomputers can be fully used to resolve some large-scale problems of complex turbulent flow.
文摘The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research etc. Many applications in these areas need super computing power. The traditional mode of sequential processing cannot meet the demands of those computations, thus, parallel processing(PP) is the main way of high performance computing (HPC) now.
基金Foundation item:Supported by the National Natural Science Foundation of China (Grant No. 50921001), National Key Basic Research Special Foundation of China (Grant No. 2010CB832704), Scientific Project for High-tech Ships: Key Technical Research on the Semi-planning Hybrid Fore-body Trimaran, Doctoral Research Foundation of Liaoning Province (Grant No. 20091012).
文摘As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.
基金Supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2013ZX06002001- 007), the National Key Scientific Instrument and Equipment Development Projects, China (No. 2012YQ180118) and the National Natural Science Foundation of China (Nos. 11275110, 11075091 and 11105081).
文摘For CFD results to be useful in IC engine analysis, simulation results should be accurate and consistent. However, with wide spread use of parallel computing nowadays, it has been reported that a model would not give the same results against the same input when the parallel computing environment is changed. The effect of parallel environment on simulation results needs to be carefully investigated and understood. In this paper, the solution inconsistency of parallel CFD simulations is investigated. First, the concept of solution inconsistency on parallel computing is reviewed, followed by a systematic CFD simulations specific to IC engine applications. The solution inconsistency against the number of CPU cores was examined using a commercial CFD code CONVERGE. A test matrix was specifically designed to examine the core number effect on engine flow, spray and combustion submodels performance. It was found that the flow field simulation during the gas exchange process is the most sensitive to the number of cores among all submodels examined. An engineering solution was developed where local upwind scheme was used to control the variability, which showed good performance. The implication of the observed inconsistency was also discussed.
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
文摘In the Windows XP 64 bit operating system environment, several common PC were used to build a cluster system, establishing the distributed memory parallel (DMP) computing system. A finite element model of whole aircraft with about 260 million degrees of freedom (DOF) was developed using three-node and four-node thin shell element and two-node beam element. With the large commercial finite element software MSC.MARC and employing two kinds of domain decomposition method (DDM) respectively, realized the parallel solving for the static strength analysis of the whole aircraft model, which offered a high cost-effective solution for solving large-scale and complex finite element models.