期刊文献+
共找到229篇文章
< 1 2 12 >
每页显示 20 50 100
Parallel computing approach for efficient 3-D X-ray-simulated image reconstruction 被引量:1
1
作者 Ou-Yi Li Yang Wang +1 位作者 Qiong Zhang Yong-Hui Li 《Nuclear Science and Techniques》 SCIE EI CAS CSCD 2023年第7期122-136,共15页
Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method... Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated. 展开更多
关键词 parallel computing Monte Carlo Digital radiography 3-D reconstruction
下载PDF
An incompressible flow solver on a GPU/CPU heterogeneous architecture parallel computing platform
2
作者 Qianqian Li Rong Li Zixuan Yang 《Theoretical & Applied Mechanics Letters》 CSCD 2023年第5期387-393,共7页
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t... A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature. 展开更多
关键词 GPU Acceleration parallel computing Poisson equation PRECONDITIONER
下载PDF
A Novel Parallel Computing Confidentiality Scheme Based on Hindmarsh-Rose Model
3
作者 Jawad Ahmad Mimonah Al Qathrady +3 位作者 Mohammed SAlshehri Yazeed Yasin Ghadi Mujeeb Ur Rehman Syed Aziz Shah 《Computers, Materials & Continua》 SCIE EI 2023年第8期1325-1341,共17页
Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to empl... Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to employ efficient and fast image encryption techniques.While 1D chaotic maps offer a practical approach to real-time image encryption,their limited flexibility and increased vulnerability restrict their practical application.In this research,we have utilized a 3DHindmarsh-Rosemodel to construct a secure cryptosystem.The randomness of the chaotic map is assessed through standard analysis.The proposed system enhances security by incorporating an increased number of system parameters and a wide range of chaotic parameters,as well as ensuring a uniformdistribution of chaotic signals across the entire value space.Additionally,a fast image encryption technique utilizing the new chaotic system is proposed.The novelty of the approach is confirmed through time complexity analysis.To further strengthen the resistance against cryptanalysis attacks and differential attacks,the SHA-256 algorithm is employed for secure key generation.Experimental results through a number of parameters demonstrate the strong cryptographic performance of the proposed image encryption approach,highlighting its exceptional suitability for secure communication.Moreover,the security of the proposed scheme has been compared with stateof-the-art image encryption schemes,and all comparison metrics indicate the superior performance of the proposed scheme. 展开更多
关键词 Hindmarsh-rose model image encryption SHA-256 parallel computing
下载PDF
A Rayleigh Wave Globally Optimal Full Waveform Inversion Framework Based on GPU Parallel Computing
4
作者 Zhao Le Wei Zhang +3 位作者 Xin Rong Yiming Wang Wentao Jin Zhengxuan Cao 《Journal of Geoscience and Environment Protection》 2023年第3期327-338,共12页
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi... Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. . 展开更多
关键词 Full Waveform Inversion Finite-Difference Method Globally Optimal Framework GPU parallel computing Particle Swarm Optimization
下载PDF
Parallel Computing of a Variational Data Assimilation Model for GPS/MET Observation Using the Ray-Tracing Method 被引量:5
5
作者 张昕 刘月巍 +1 位作者 王斌 季仲贞 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2004年第2期220-226,共7页
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V... The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies. 展开更多
关键词 parallel computing variational data assimilation GPS/MET
下载PDF
New multi-DSP parallel computing architecture for real-time image processing 被引量:4
6
作者 Hu Junhong Zhang Tianxu Jiang Haoyang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第4期883-889,共7页
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present... The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment. 展开更多
关键词 parallel computing image processing REAL-TIME computer architecture
下载PDF
PARALLEL COMPUTING FOR STATIC RESPONSE ANALYSIS OF STRUCTURES WITH UNCERTAIN-BUT-BOUNDED PARAMETERS 被引量:2
7
作者 Zhiping Qiu Xiaojun Wang Xu Zhang 《Acta Mechanica Solida Sinica》 SCIE EI 2008年第5期472-482,共11页
The vertex solution for estimation on the static displacement bounds of structures with uncertain-but-bounded parameters is studied in this paper. For the linear static problem, when there are uncertain interval param... The vertex solution for estimation on the static displacement bounds of structures with uncertain-but-bounded parameters is studied in this paper. For the linear static problem, when there are uncertain interval parameters in the stiffness matrix and the vector of applied forces, the static response may be an interval. Based on the interval operations, the interval solution obtained by the vertex solution is more accurate and more credible than other methods (such as the perturbation method). However, the vertex solution method by traditional serial computing usually needs large computational efforts, especially for large structures. In order to avoid its disadvantages of large calculation and much runtime, its parallel computing which can be used in large-scale computing is presented in this paper. Two kinds of parallel computing algorithms are proposed based on the vertex solution. The parallel computing will solve many interval problems which cannot be resolved by traditional interval analysis methods. 展开更多
关键词 vertex solution interval analysis UBB parallel computing
下载PDF
Multi-core based parallel computing technique for content-based image retrieval 被引量:1
8
作者 陈文浩 方昱春 +1 位作者 姚继锋 张武 《Journal of Shanghai University(English Edition)》 2010年第1期55-59,共5页
In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based ... In this paper, we propose a parallel computing technique for content-based image retrieval (CBIR) system. This technique is mainly used for single node with multi-core processor, which is different from those based on cluster or network computing architecture. Due to its specific applications (such as medical image processing) and the harsh terms of hardware resource requirement, the CBIR system has been prevented from being widely used. With the increasing volume of the image database, the widespread use of multi-core processors, and the requirement of the retrieval accuracy and speed, we need to achieve a retrieval strategy which is based on multi-core processor to make the retrieval faster and more convenient than before. Experimental results demonstrate that this parallel architecture can significantly improve the performance of retrieval system. In addition, we also propose an efficient parallel technique with the combinations of the cluster and the multi-core techniques, which is supposed to gear to the new trend of the cloud computing. 展开更多
关键词 content-based image retrieval (CBIR) parallel computing SHARED-MEMORY feature extraction similarity comparison
下载PDF
The Parallel Computing of GPS Ray-shooting Model
9
作者 李树勇 王斌 张昕 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2001年第6期1185-1191,共7页
The Global Positioning System (GPS) ray-shooting model is a self-sufficient observation operator in GPS/ MET (Meteorology) data variational assimilation linking up the GPS observation data and the atmospheric state va... The Global Positioning System (GPS) ray-shooting model is a self-sufficient observation operator in GPS/ MET (Meteorology) data variational assimilation linking up the GPS observation data and the atmospheric state variables. But its huge computations make it impracticable in real data assimilation so far. In order to overcome this default, a parallel version of the GPS ray-shooting model has been developed, and has been running successfully on the PC cluster manufactured under the support of the China National Key Development Planning Project for Basic Research: The Large Scale Scientific Computation Research. High speed-up and Efficiency as well as good scalability are obtained. This is an important step for this GPS observation operator to become practicable. Key words GPS ray-shooting - Parallel computing - Efficiency - Scalability This research was supported by the National Natural Science Foundation of China(Grant No. 49825109), the National Key Development Planning Project for Basic Research (Grant No. 1999032801) and the CAS Key Innovation Direction Project (Grant No.KZCX2208). 展开更多
关键词 GPS ray-shooting parallel computing EFFICIENCY SCALABILITY
下载PDF
Parallel computing for finite element structural analysis using conjugategradient method based on domain decomposition
10
作者 付朝江 张武 《Journal of Shanghai University(English Edition)》 CAS 2006年第6期517-521,共5页
Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjuga... Parallel finite element method using domain decomposition technique is adapted to a distributed parallel environment of workstation cluster. The algorithm is presented for parallelization of the preconditioned conjugate gradient method based on domain decomposition. Using the developed code, a dam structural analysis problem is solved on workstation cluster and results are given. The parallel performance is analyzed. 展开更多
关键词 parallel computing workstation cluster finite element DAM domain decomposition.
下载PDF
Parallel Computing of Ocean General Circulation Model
11
作者 Zhang Li lun 1, Song Jun qiang 1, Li Xiao mei 2 1. School of Computer Science, National University of Defense Technology, Changsha 410073, China 2. Department of Computer, Institute of Command Technology, Beijing 100081, China 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期568-573,共6页
This paper discusses the parallel computing of the third generation Ocean General Circulation Model (OGCM) from the State Key Laboratory of Numerical Modeling for Atmospheric Science and Geophysical Fluid Dynamics(LAS... This paper discusses the parallel computing of the third generation Ocean General Circulation Model (OGCM) from the State Key Laboratory of Numerical Modeling for Atmospheric Science and Geophysical Fluid Dynamics(LASG),Institute of Atmosphere Physics(IAP). Meanwhile, several optimization strategies for parallel computing of OGCM (POGCM) on Scalable Shared Memory Multiprocessor (S2MP) are presented. Using Message Passing Interface (MPI), we obtain super linear speedup on SGI Origin 2000 for parallel OGCM(POGCM) after optimization. 展开更多
关键词 parallel computing OGCM optimization strategy S2MP
下载PDF
Dynamic Distribution Model with Prime Granularity for Parallel Computing
12
作者 孙济洲 张绍敏 李小图 《Transactions of Tianjin University》 EI CAS 2005年第5期343-347,共5页
Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simulta... Dynamic distribution model is one of the best schemes for parallel volume rendering. How- ever, in homogeneous cluster system.since the granularity is traditionally identical, all processors communicate almost simultaneously and computation load may lose balance. Due to problems above, a dynamic distribution model with prime granularity for parallel computing is presented. Granularities of each processor are relatively prime, and related theories are introduced. A high parallel performance can be achieved by minimizing network competition and using a load balancing strategy that ensures all processors finish almost simultaneously. Based on Master-Slave-Gleaner ( MSG) scheme, the parallel Splatting Algorithm for volume rendering is used to test the model on IBM Cluster 1350 system. The experimental results show that the model can bring a considerable improvement in performance, including computation efficiency, total execution time, speed, and load balancing. 展开更多
关键词 GRANULARITY parallel computing load balancing dynamic distribution model
下载PDF
High performance parallel computing of large eddy simulation of the flow in a curved duct with square cross section
13
作者 樊洪明 黄伟 魏英杰 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2004年第4期442-446,共5页
Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain d... Large eddy simulation(LES) cooperated with a high performance parallel computing method is applied to simulate the flow in a curved duct with square cross section in the paper. The method consists of parallel domain decomposition of grids, creation of virtual diagonal bordered matrix, assembling of boundary matrix, parallel LDL^T decomposition, parallel solving of Poisson Equation, parallel estimation of convergence and so on. The parallel computing method can solve the problems that are difficult to solve using traditional serial computing. Furthermore, existing microcomputers can be fully used to resolve some large-scale problems of complex turbulent flow. 展开更多
关键词 turbulent flow large eddy simulation finite element method domain decomposition method parallel computing
下载PDF
Technique Development and Application——Construction of a Beowulf Cluster for Parallel Computing
14
作者 FENG Kun DONG Jiaqi ZHANG Jinhua 《Southwestern Institute of Physics Annual Report》 2004年第1期138-141,共4页
The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research et... The large-scale computations are often performed in science and engineering areas such as numerical weather forecasting, astrophysics, energy resources exploration, nuclear weapon design, and plasma fusion research etc. Many applications in these areas need super computing power. The traditional mode of sequential processing cannot meet the demands of those computations, thus, parallel processing(PP) is the main way of high performance computing (HPC) now. 展开更多
关键词 parallel computing Beowulf Cluster MPICH
下载PDF
Parallel Computing of the Underwater Explosion Cavitation Effects on Full-scale Ship Structures 被引量:7
15
作者 Zhi Zong Yanjie Zhao +2 位作者 Fan Ye Haitao Li Gang Chen 《Journal of Marine Science and Application》 2012年第4期469-477,共9页
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th... As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects. 展开更多
关键词 underwater explosion CAVITATION parallel computation full-scale ship
下载PDF
Parallel Computing Based Solution for Reliability-constrained Distribution Network Planning
16
作者 Yaqi Sun Wenchuan Wu +2 位作者 Yi Lin Hai Huang Hao Chen 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2024年第4期1147-1158,共12页
The main goal of distribution network(DN)expansion planning is essentially to achieve minimal investment con-strained by specified reliability requirements.The reliability-constrained distribution network planning(RcD... The main goal of distribution network(DN)expansion planning is essentially to achieve minimal investment con-strained by specified reliability requirements.The reliability-constrained distribution network planning(RcDNP)problem can be cast as an instance of mixed-integer linear programming(MILP)which involves ultra-heavy computation burden especially for large-scale DNs.In this paper,we propose a parallel computing based solution method for the RcDNP problem.The RcDNP is decomposed into a backbone grid and several lateral grid problems with coordination.Then,a parallelizable augmented Lagrangian algorithm with acceleration method is developed to solve the coordination planning problems.The lateral grid problems are solved in parallel through coordinating with the backbone grid planning problem.Gauss-Seidel iteration is adopted on the subset of the convex hull of the feasible region constructed by decomposition.Under mild conditions,the optimality and convergence of the proposed method are verified.Numerical tests show that the proposed method can significantly reduce the solution time and make the RcDNP applicable for real-worldproblems. 展开更多
关键词 Distribution network expansion planning RELIABILITY parallel computing
原文传递
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
17
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/GPU parallel computing hybrid OpenMPCUDA
下载PDF
High-precision parallel computing model of solute transport based on GPU acceleration
18
作者 Shang-hong Zhang Rong-qi Zhang +2 位作者 Wen-da Li Xi-yan Yang Yang Zhou 《Journal of Hydrodynamics》 SCIE EI CSCD 2024年第1期202-212,共11页
The scenario simulation analysis of water environmental emergencies is very important for risk prevention and control,and emergency response.To quickly and accurately simulate the transport and diffusion process of hi... The scenario simulation analysis of water environmental emergencies is very important for risk prevention and control,and emergency response.To quickly and accurately simulate the transport and diffusion process of high-intensity pollutants during sudden environmental water pollution events,in this study,a high-precision pollution transport and diffusion model for unstructured grids based on Compute Unified Device Architecture(CUDA)is proposed.The finite volume method of a total variation diminishing limiter with the Kong proposed r-factor is used to reduce numerical diffusion and oscillation errors in the simulation of pollutants under sharp concentration conditions,and graphics processing unit acceleration technology is used to improve computational efficiency.The advection diffusion process of the model is verified numerically using two benchmark cases,and the efficiency of the model is evaluated using an engineering example.The results demonstrate that the model perform well in the simulation of material transport in the presence of sharp concentration.Additionally,it has high computational efficiency.The acceleration ratio is 46 times the single-thread acceleration effect of the original model.The efficiency of the accelerated model meet the requirements of an engineering application,and the rapid early warning and assessment of water pollution accidents is achieved. 展开更多
关键词 Pollution transport and diffusion model parallel computing Compute Unified Device Architecture(CUDA) pollution event
原文传递
Static Analysis Techniques for Fixing Software Defects in MPI-Based Parallel Programs
19
作者 Norah Abdullah Al-Johany Sanaa Abdullah Sharaf +1 位作者 Fathy Elbouraey Eassa Reem Abdulaziz Alnanih 《Computers, Materials & Continua》 SCIE EI 2024年第5期3139-3173,共35页
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par... The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems. 展开更多
关键词 High-performance computing parallel computing software engineering software defect message passing interface DEADLOCK
下载PDF
MPI/OpenMP-Based Parallel Solver for Imprint Forming Simulation
20
作者 Yang Li Jiangping Xu +2 位作者 Yun Liu Wen Zhong Fei Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期461-483,共23页
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr... In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns. 展开更多
关键词 Hybrid MPI/OpenMP parallel computing MPI OPENMP imprint forming
下载PDF
上一页 1 2 12 下一页 到第
使用帮助 返回顶部