期刊文献+
共找到186篇文章
< 1 2 10 >
每页显示 20 50 100
Static Analysis Techniques for Fixing Software Defects in MPI-Based Parallel Programs
1
作者 Norah Abdullah Al-Johany Sanaa Abdullah Sharaf +1 位作者 Fathy Elbouraey Eassa Reem Abdulaziz Alnanih 《Computers, Materials & Continua》 SCIE EI 2024年第5期3139-3173,共35页
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par... The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems. 展开更多
关键词 High-performance computing parallel computing software engineering software defect message passing interface DEADLOCK
下载PDF
MPI/OpenMP-Based Parallel Solver for Imprint Forming Simulation
2
作者 Yang Li Jiangping Xu +2 位作者 Yun Liu Wen Zhong Fei Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期461-483,共23页
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr... In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns. 展开更多
关键词 Hybrid MPI/OpenMP parallel computing MPI OPENMP imprint forming
下载PDF
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
3
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/GPU parallel computing hybrid OpenMPCUDA
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
4
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel Computing Image Processing OPENMP parallel Programming High Performance Computing GPU (Graphic Processing Unit)
下载PDF
Parallel Inference for Real-Time Machine Learning Applications
5
作者 Sultan Al Bayyat Ammar Alomran +3 位作者 Mohsen Alshatti Ahmed Almousa Rayyan Almousa Yasir Alguwaifli 《Journal of Computer and Communications》 2024年第1期139-146,共8页
Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes... Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware. 展开更多
关键词 Machine Learning Models Computational Efficiency parallel Computing Systems Random Forest Inference Hyperparameter Tuning Python Frameworks (TensorFlow PyTorch Scikit-Learn) High-Performance Computing
下载PDF
Realistic Efficiency Evaluations for Parallel Computations under Workstation Cluster
6
作者 Mo Zeyao Li Xiaomei(Dept. of Computer, Changsha institute of Technology Changsha, China, 410073) 《Wuhan University Journal of Natural Sciences》 CAS 1996年第Z1期329-336,共8页
In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processin... In recent years, high performance scientific computing under workstation cluster connected by local area network is becoming a hot point. Owing to both the longer latency and the higher overhead for protocol processing compared with the powerful single workstation capacity, it is becoming severe important to keep balance not only for numerical load but also for communication load, and to overlap communications with computations while parallel computing. Hence,our efficiency evaluation rules must discover these capacities of a given parallel algorithm in order to optimize the existed algorithm to attain its highest parallel efficiency. The traditional efficiency evaluation rules can not succeed in this work any more. Fortunately, thanks to Culler's detail discuss in LogP model about interconnection networks for MPP systems, we present a system of efficiency evaluation rules for parallel computations under workstation cluster with PVM3.0 parallel software framework in this paper. These rules can satisfy above acquirements successfully. At last, two typical synchronous,and asynchronous applications are designed to verify the validity of these rules under 4 SGIs workstations cluster connected by Ethernet. 展开更多
关键词 parallel efficiency evaluation workstation cluster PVM. network parallel computations.
下载PDF
Parallel computing approach for efficient 3-D X-ray-simulated image reconstruction 被引量:1
7
作者 Ou-Yi Li Yang Wang +1 位作者 Qiong Zhang Yong-Hui Li 《Nuclear Science and Techniques》 SCIE EI CAS CSCD 2023年第7期122-136,共15页
Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method... Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated. 展开更多
关键词 parallel computing Monte Carlo Digital radiography 3-D reconstruction
下载PDF
Energy-efficient task allocation for reliable parallel computation of cluster-based wireless sensor network in edge computing
8
作者 Jiabao Wen Jiachen Yang +2 位作者 Tianying Wang Yang Li Zhihan Lv 《Digital Communications and Networks》 SCIE CSCD 2023年第2期473-482,共10页
To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel c... To efficiently complete a complex computation task,the complex task should be decomposed into subcomputation tasks that run parallel in edge computing.Wireless Sensor Network(WSN)is a typical application of parallel computation.To achieve highly reliable parallel computation for wireless sensor network,the network's lifetime needs to be extended.Therefore,a proper task allocation strategy is needed to reduce the energy consumption and balance the load of the network.This paper proposes a task model and a cluster-based WSN model in edge computing.In our model,different tasks require different types of resources and different sensors provide different types of resources,so our model is heterogeneous,which makes the model more practical.Then we propose a task allocation algorithm that combines the Genetic Algorithm(GA)and the Ant Colony Optimization(ACO)algorithm.The algorithm concentrates on energy conservation and load balancing so that the lifetime of the network can be extended.The experimental result shows the algorithm's effectiveness and advantages in energy conservation and load balancing. 展开更多
关键词 Wireless sensor network parallel computation Task allocation Genetic algorithm Ant colony optimization algorithm ENERGY-EFFICIENT Load balancing
下载PDF
A Novel Parallel Computing Confidentiality Scheme Based on Hindmarsh-Rose Model
9
作者 Jawad Ahmad Mimonah Al Qathrady +3 位作者 Mohammed SAlshehri Yazeed Yasin Ghadi Mujeeb Ur Rehman Syed Aziz Shah 《Computers, Materials & Continua》 SCIE EI 2023年第8期1325-1341,共17页
Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to empl... Due to the inherent insecure nature of the Internet,it is crucial to ensure the secure transmission of image data over this network.Additionally,given the limitations of computers,it becomes evenmore important to employ efficient and fast image encryption techniques.While 1D chaotic maps offer a practical approach to real-time image encryption,their limited flexibility and increased vulnerability restrict their practical application.In this research,we have utilized a 3DHindmarsh-Rosemodel to construct a secure cryptosystem.The randomness of the chaotic map is assessed through standard analysis.The proposed system enhances security by incorporating an increased number of system parameters and a wide range of chaotic parameters,as well as ensuring a uniformdistribution of chaotic signals across the entire value space.Additionally,a fast image encryption technique utilizing the new chaotic system is proposed.The novelty of the approach is confirmed through time complexity analysis.To further strengthen the resistance against cryptanalysis attacks and differential attacks,the SHA-256 algorithm is employed for secure key generation.Experimental results through a number of parameters demonstrate the strong cryptographic performance of the proposed image encryption approach,highlighting its exceptional suitability for secure communication.Moreover,the security of the proposed scheme has been compared with stateof-the-art image encryption schemes,and all comparison metrics indicate the superior performance of the proposed scheme. 展开更多
关键词 Hindmarsh-rose model image encryption SHA-256 parallel computing
下载PDF
Enhanced Parallelized DNA-Coded Stream Cipher Based on Multiplayer Prisoners’Dilemma
10
作者 Khaled M.Suwais 《Computers, Materials & Continua》 SCIE EI 2023年第5期2685-2704,共20页
Data encryption is essential in securing exchanged data between connected parties.Encryption is the process of transforming readable text into scrambled,unreadable text using secure keys.Stream ciphers are one type of... Data encryption is essential in securing exchanged data between connected parties.Encryption is the process of transforming readable text into scrambled,unreadable text using secure keys.Stream ciphers are one type of an encryption algorithm that relies on only one key for decryption and as well as encryption.Many existing encryption algorithms are developed based on either a mathematical foundation or on other biological,social or physical behaviours.One technique is to utilise the behavioural aspects of game theory in a stream cipher.In this paper,we introduce an enhanced Deoxyribonucleic acid(DNA)-coded stream cipher based on an iterated n-player prisoner’s dilemma paradigm.Our main goal is to contribute to adding more layers of randomness to the behaviour of the keystream generation process;these layers are inspired by the behaviour of multiple players playing a prisoner’s dilemma game.We implement parallelism to compensate for the additional processing time that may result fromadding these extra layers of randomness.The results show that our enhanced design passes the statistical tests and achieves an encryption throughput of about 1,877 Mbit/s,which makes it a feasible secure stream cipher. 展开更多
关键词 ENCRYPTION game theory DNA cryptography stream cipher parallel computing
下载PDF
An incompressible flow solver on a GPU/CPU heterogeneous architecture parallel computing platform
11
作者 Qianqian Li Rong Li Zixuan Yang 《Theoretical & Applied Mechanics Letters》 CSCD 2023年第5期387-393,共7页
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t... A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature. 展开更多
关键词 GPU Acceleration parallel computing Poisson equation PRECONDITIONER
下载PDF
A Rayleigh Wave Globally Optimal Full Waveform Inversion Framework Based on GPU Parallel Computing
12
作者 Zhao Le Wei Zhang +3 位作者 Xin Rong Yiming Wang Wentao Jin Zhengxuan Cao 《Journal of Geoscience and Environment Protection》 2023年第3期327-338,共12页
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi... Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. . 展开更多
关键词 Full Waveform Inversion Finite-Difference Method Globally Optimal Framework GPU parallel Computing Particle Swarm Optimization
下载PDF
Three-dimensional finite element simulation and reconstruction of jointed rock models using CT scanning and photogrammetry
13
作者 Yingxian Lang Zhengzhao Liang Zhuo Dong 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第4期1348-1361,共14页
The geometry of joints has a significant influence on the mechanical properties of rocks.To simplify the curved joint shapes in rocks,the joint shape is usually treated as straight lines or planes in most laboratory e... The geometry of joints has a significant influence on the mechanical properties of rocks.To simplify the curved joint shapes in rocks,the joint shape is usually treated as straight lines or planes in most laboratory experiments and numerical simulations.In this study,the computerized tomography (CT) scanning and photogrammetry were employed to obtain the internal and surface joint structures of a limestone sample,respectively.To describe the joint geometry,the edge detection algorithms and a three-dimensional (3D) matrix mapping method were applied to reconstruct CT-based and photogrammetry-based jointed rock models.For comparison tests,the numerical uniaxial compression tests were conducted on an intact rock sample and a sample with a joint simplified to a plane using the parallel computing method.The results indicate that the mechanical characteristics and failure process of jointed rocks are significantly affected by the geometry of joints.The presence of joints reduces the uniaxial compressive strength (UCS),elastic modulus,and released acoustic emission (AE) energy of rocks by 37%–67%,21%–24%,and 52%–90%,respectively.Compared to the simplified joint sample,the proposed photogrammetry-based numerical model makes the most of the limited geometry information of joints.The UCS,accumulative released AE energy,and elastic modulus of the photogrammetry-based sample were found to be very close to those of the CT-based sample.The UCS value of the simplified joint sample (i.e.38.5 MPa) is much lower than that of the CT-based sample (i.e.72.3 MPa).Additionally,the accumulative released AE energy observed in the simplified joint sample is 3.899 times lower than that observed in the CT-based sample.CT scanning provides a reliable means to visualize the joints in rocks,which can be used to verify the reliability of photogrammetry techniques.The application of the photogrammetry-based sample enables detailed analysis for estimating the mechanical properties of jointed rocks. 展开更多
关键词 X-ray computerized tomography(CT)scanning PHOTOGRAMMETRY parallel computing Numerical simulation Uniaxial compression test Digital image processing
下载PDF
An efficient approach for the equivalent linearization of frame structures with plastic hinges under nonstationary seismic excitations
14
作者 Huang Huan Li Yingxiong Li Yuyu 《Earthquake Engineering and Engineering Vibration》 SCIE EI CSCD 2024年第3期677-690,共14页
An efficient approach is proposed for the equivalent linearization of frame structures with plastic hinges under nonstationary seismic excitations.The concentrated plastic hinges,described by the Bouc-Wen model,are as... An efficient approach is proposed for the equivalent linearization of frame structures with plastic hinges under nonstationary seismic excitations.The concentrated plastic hinges,described by the Bouc-Wen model,are assumed to occur at the two ends of a linear-elastic beam element.The auxiliary differential equations governing the plastic rotational displacements and their corresponding hysteretic displacements are replaced with linearized differential equations.Then,the two sets of equations of motion for the original nonlinear system can be reduced to an expanded-order equivalent linearized equation of motion for equivalent linear systems.To solve the equation of motion for equivalent linear systems,the nonstationary random vibration analysis is carried out based on the explicit time-domain method with high efficiency.Finally,the proposed treatment method for initial values of equivalent parameters is investigated in conjunction with parallel computing technology,which provides a new way of obtaining the equivalent linear systems at different time instants.Based on the explicit time-domain method,the key responses of interest of the converged equivalent linear system can be calculated through dimension reduction analysis with high efficiency.Numerical examples indicate that the proposed approach has high computational efficiency,and shows good applicability to weak nonlinear and medium-intensity nonlinear systems. 展开更多
关键词 nonstationary random vibration plastic hinge equivalent linearization method explicit time-domain method parallel computation
下载PDF
Analysis on intersections between fractures by parallel computation 被引量:10
15
作者 Zhiyu Li Mingyu Wang +1 位作者 Jianhui Zhao Xiaohui Qiao 《International Journal of Coal Science & Technology》 EI CAS 2014年第3期356-363,共8页
The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies al... The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies all pairs of intersected fractures(the basic components composing the connectivity of a network),is one of its crucial procedures.This paper attempts to improve intersection analysis through parallel computing.Considering a seamless interfacing with other procedures in modeling,two algorithms are designed and presented,of which one is a completely independent parallel procedure with some redundant computations and the other is an optimized version with reduced redundancy.A numerical study indicates that both of the algorithms are practical and can significantly improve the computational performance of intersection analysis for large-scale simulations.Moreover,the preferred application conditions for the two algorithms are also discussed. 展开更多
关键词 Fracture intersections Discrete fracture network-Intersection analysis parallel computing
下载PDF
PARALLEL ANALYSIS OF COMBINED FINITE/DISCRETE ELEMENT SYSTEMS ON PC CLUSTER 被引量:5
16
作者 王福军 Y.T.FENG +2 位作者 D.R.J.OWEN 张静 刘洋 《Acta Mechanica Sinica》 SCIE EI CAS CSCD 2004年第5期534-540,共7页
A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to... A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to implement the dynamic domain decomposition.The domain decomposition approach perfectly matches the requirement of reducing the memory size per processor of the calculation.To treat the contact between boundary elements in neighbouring subdomains,the elements in a subdomain are classified into internal,interfacial and external elements.In this way,all the contact detect algorithms developed for a sequential computation could be adopted directly in the parallel computation.Numerical examples show that this implementation is suitable for simulating large-scale problems.Two typical numerical examples are given to demonstrate the parallel efficiency and scalability on a PC cluster. 展开更多
关键词 parallel computation finite element discrete element PC cluster
下载PDF
A parallel fast multipole BEM and its applications to large-scale analysis of 3-D fiber-reinforced composites 被引量:4
17
作者 Ting Lei Zhenhan Yao Haitao Wang PengboWang 《Acta Mechanica Sinica》 SCIE EI CAS CSCD 2006年第3期225-232,共8页
In this paper, an adaptive boundary element method (BEM) is presented for solving 3-D elasticity problems. The numerical scheme is accelerated by the new version of fast multipole method (FMM) and parallelized on ... In this paper, an adaptive boundary element method (BEM) is presented for solving 3-D elasticity problems. The numerical scheme is accelerated by the new version of fast multipole method (FMM) and parallelized on distributed memory architectures. The resulting solver is applied to the study of representative volume element (RVE) for short fiberreinforced composites with complex inclusion geometry. Numerical examples performed on a 32-processor cluster show that the proposed method is both accurate and efficient, and can solve problems of large size that are challenging to existing state-of-the-art domain methods. 展开更多
关键词 Boundary element method Fast multipole method parallel computing Fiber-reinforced composites
下载PDF
Parallel Computing of a Variational Data Assimilation Model for GPS/MET Observation Using the Ray-Tracing Method 被引量:5
18
作者 张昕 刘月巍 +1 位作者 王斌 季仲贞 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2004年第2期220-226,共7页
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V... The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies. 展开更多
关键词 parallel computing variational data assimilation GPS/MET
下载PDF
New multi-DSP parallel computing architecture for real-time image processing 被引量:4
19
作者 Hu Junhong Zhang Tianxu Jiang Haoyang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第4期883-889,共7页
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present... The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment. 展开更多
关键词 parallel computing image processing REAL-TIME computer architecture
下载PDF
Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU 被引量:6
20
作者 Jingzhe Tang Yanfeng Zheng +2 位作者 Chao Yang Wei Wang Yaozhi Luo 《Computer Modeling in Engineering & Sciences》 SCIE EI 2020年第1期5-31,共27页
As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle d... As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2. 展开更多
关键词 Finite particle method GPU parallel computing explicit dynamics
下载PDF
上一页 1 2 10 下一页 到第
使用帮助 返回顶部