期刊文献+
共找到230篇文章
< 1 2 12 >
每页显示 20 50 100
Static Analysis Techniques for Fixing Software Defects in MPI-Based Parallel Programs
1
作者 Norah Abdullah Al-Johany Sanaa Abdullah Sharaf +1 位作者 Fathy Elbouraey Eassa Reem Abdulaziz Alnanih 《Computers, Materials & Continua》 SCIE EI 2024年第5期3139-3173,共35页
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of par... The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed memorysystems.However, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems. 展开更多
关键词 High-performance computing parallel computing software engineering software defect message passing interface DEADLOCK
下载PDF
MPI/OpenMP-Based Parallel Solver for Imprint Forming Simulation
2
作者 Yang Li Jiangping Xu +2 位作者 Yun Liu Wen Zhong Fei Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期461-483,共23页
In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining pr... In this research,we present the pure open multi-processing(OpenMP),pure message passing interface(MPI),and hybrid MPI/OpenMP parallel solvers within the dynamic explicit central difference algorithm for the coining process to address the challenge of capturing fine relief features of approximately 50 microns.Achieving such precision demands the utilization of at least 7 million tetrahedron elements,surpassing the capabilities of traditional serial programs previously developed.To mitigate data races when calculating internal forces,intermediate arrays are introduced within the OpenMP directive.This helps ensure proper synchronization and avoid conflicts during parallel execution.Additionally,in the MPI implementation,the coins are partitioned into the desired number of regions.This division allows for efficient distribution of computational tasks across multiple processes.Numerical simulation examples are conducted to compare the three solvers with serial programs,evaluating correctness,acceleration ratio,and parallel efficiency.The results reveal a relative error of approximately 0.3%in forming force among the parallel and serial solvers,while the predicted insufficient material zones align with experimental observations.Additionally,speedup ratio and parallel efficiency are assessed for the coining process simulation.The pureMPI parallel solver achieves a maximum acceleration of 9.5 on a single computer(utilizing 12 cores)and the hybrid solver exhibits a speedup ratio of 136 in a cluster(using 6 compute nodes and 12 cores per compute node),showing the strong scalability of the hybrid MPI/OpenMP programming model.This approach effectively meets the simulation requirements for commemorative coins with intricate relief patterns. 展开更多
关键词 Hybrid MPI/OpenMP parallel computing MPI OPENMP imprint forming
下载PDF
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
3
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/GPU parallel computing hybrid OpenMPCUDA
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
4
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel Computing Image Processing OPENMP parallel Programming High Performance Computing GPU (Graphic Processing Unit)
下载PDF
Parallel Inference for Real-Time Machine Learning Applications
5
作者 Sultan Al Bayyat Ammar Alomran +3 位作者 Mohsen Alshatti Ahmed Almousa Rayyan Almousa Yasir Alguwaifli 《Journal of Computer and Communications》 2024年第1期139-146,共8页
Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes... Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware. 展开更多
关键词 Machine Learning Models Computational Efficiency parallel Computing Systems Random Forest Inference Hyperparameter Tuning Python Frameworks (TensorFlow PyTorch Scikit-Learn) High-Performance Computing
下载PDF
The group search-based parallel algorithm for the serial Monte Carlo inversion method 被引量:3
6
作者 魏超 李小凡 郑晓东 《Applied Geophysics》 SCIE CSCD 2010年第2期127-134,193,共9页
With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel al... With the development of parallel computing technology,non-linear inversion calculation efficiency has been improving.However,for single-point search-based non-linear inversion methods,the implementation of parallel algorithms is a difficult issue.We introduce the idea of group search to the single-point search-based non-linear inversion algorithm, taking the quantum Monte Carlo method as an example for two-dimensional seismic wave velocity inversion and practical impedance inversion and test the calculation efficiency of using different node numbers.The results show the parallel algorithm in theoretical and practical data inversion is feasible and effective.The parallel algorithm has good versatility. The algorithm efficiency increases with increasing node numbers but the algorithm efficiency rate of increase gradually decreases as the node numbers increase. 展开更多
关键词 non-linear inversion single-point search group search parallel computation
下载PDF
Parallel Implementation of Global Illumination Using PVM
7
作者 孙济洲 Nicolas D Georganas 《Transactions of Tianjin University》 EI CAS 2002年第3期178-182,共5页
In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation al... In this paper an attempt of employing network resources to solve a complex and time-consuming problem is presented. The global illumination problem is selected as the study objective. An improved density estimation algorithm is first developed, in which the more inherent concurrency is explored. Then its parallel implementation by using a PVM mechanism and the running performance analysis are provided. The analysis results show the expected speed-up obtained and demonstrate that the PVM has good application prospects for parallel computation in a distributed network. 展开更多
关键词 parallel computation parallel virtual machine(PVM) global illumination distributed network
下载PDF
Two-dimensional inversion of spectral induced polarization data using MPI parallel algorithm in data space 被引量:2
8
作者 张志勇 谭捍东 +3 位作者 王堃鹏 林昌洪 张斌 谢茂笔 《Applied Geophysics》 SCIE CSCD 2016年第1期13-24,217,共13页
Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electro... Traditional two-dimensional(2D) complex resistivity forward modeling is based on Poisson's equation but spectral induced polarization(SIP) data are the coproducts of the induced polarization(IP) and the electromagnetic induction(EMI) effects.This is especially true under high frequencies,where the EMI effect can exceed the IP effect.2D inversion that only considers the IP effect reduces the reliability of the inversion data.In this paper,we derive differential equations using Maxwell's equations.With the introduction of the Cole-Cole model,we use the finite-element method to conduct2 D SIP forward modeling that considers the EMI and IP effects simultaneously.The data-space Occam method,in which different constraints to the model smoothness and parametric boundaries are introduced,is then used to simultaneously obtain the four parameters of the Cole-Cole model using multi-array electric field data.This approach not only improves the stability of the inversion but also significantly reduces the solution ambiguity.To improve the computational efficiency,message passing interface programming was used to accelerate the 2D SIP forward modeling and inversion.Synthetic datasets were tested using both serial and parallel algorithms,and the tests suggest that the proposed parallel algorithm is robust and efficient. 展开更多
关键词 Spectral induced polarization 2D inversion data-space method Cole-Cole model MPI parallel computation
下载PDF
Forward and backward models for fault diagnosis based on parallel genetic algorithms 被引量:10
9
作者 Yi LIU Ying LI +1 位作者 Yi-jia CAO Chuang-xin GUO 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第10期1420-1425,共6页
In this paper, a mathematical model consisting of forward and backward models is built on parallel genetic algorithms (PGAs) for fault diagnosis in a transmission power system. A new method to reduce the scale of faul... In this paper, a mathematical model consisting of forward and backward models is built on parallel genetic algorithms (PGAs) for fault diagnosis in a transmission power system. A new method to reduce the scale of fault sections is developed in the forward model and the message passing interface (MPI) approach is chosen to parallel the genetic algorithms by global sin-gle-population master-slave method (GPGAs). The proposed approach is applied to a sample system consisting of 28 sections, 84 protective relays and 40 circuit breakers. Simulation results show that the new model based on GPGAs can achieve very fast computation in online applications of large-scale power systems. 展开更多
关键词 Forward and backward models Fault diagnosis Global single-population master-slave genetic algorithms (GPGAs) parallel computation
下载PDF
Analysis on intersections between fractures by parallel computation 被引量:10
10
作者 Zhiyu Li Mingyu Wang +1 位作者 Jianhui Zhao Xiaohui Qiao 《International Journal of Coal Science & Technology》 EI CAS 2014年第3期356-363,共8页
The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies al... The discrete fracture network model is a powerful tool for fractured rock mass fluid flow simulations and supports safety assessments of coal mine hazards such as water inrush.Intersection analysis,which identifies all pairs of intersected fractures(the basic components composing the connectivity of a network),is one of its crucial procedures.This paper attempts to improve intersection analysis through parallel computing.Considering a seamless interfacing with other procedures in modeling,two algorithms are designed and presented,of which one is a completely independent parallel procedure with some redundant computations and the other is an optimized version with reduced redundancy.A numerical study indicates that both of the algorithms are practical and can significantly improve the computational performance of intersection analysis for large-scale simulations.Moreover,the preferred application conditions for the two algorithms are also discussed. 展开更多
关键词 Fracture intersections Discrete fracture network-Intersection analysis parallel computing
下载PDF
Parallel Computing of the Underwater Explosion Cavitation Effects on Full-scale Ship Structures 被引量:7
11
作者 Zhi Zong Yanjie Zhao +2 位作者 Fan Ye Haitao Li Gang Chen 《Journal of Marine Science and Application》 2012年第4期469-477,共9页
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th... As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects. 展开更多
关键词 underwater explosion CAVITATION parallel computation full-scale ship
下载PDF
PARALLEL ANALYSIS OF COMBINED FINITE/DISCRETE ELEMENT SYSTEMS ON PC CLUSTER 被引量:5
12
作者 王福军 Y.T.FENG +2 位作者 D.R.J.OWEN 张静 刘洋 《Acta Mechanica Sinica》 SCIE EI CAS CSCD 2004年第5期534-540,共7页
A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to... A computational strategy is presented for the nonlinear dynamic analysis of large- scale combined finite/discrete element systems on a PC cluster.In this strategy,a dual-level domain decomposition scheme is adopted to implement the dynamic domain decomposition.The domain decomposition approach perfectly matches the requirement of reducing the memory size per processor of the calculation.To treat the contact between boundary elements in neighbouring subdomains,the elements in a subdomain are classified into internal,interfacial and external elements.In this way,all the contact detect algorithms developed for a sequential computation could be adopted directly in the parallel computation.Numerical examples show that this implementation is suitable for simulating large-scale problems.Two typical numerical examples are given to demonstrate the parallel efficiency and scalability on a PC cluster. 展开更多
关键词 parallel computation finite element discrete element PC cluster
下载PDF
Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU 被引量:6
13
作者 Jingzhe Tang Yanfeng Zheng +2 位作者 Chao Yang Wei Wang Yaozhi Luo 《Computer Modeling in Engineering & Sciences》 SCIE EI 2020年第1期5-31,共27页
As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle d... As a novel kind of particle method for explicit dynamics,the finite particle method(FPM)does not require the formation or solution of global matrices,and the evaluations of the element equivalent forces and particle displacements are decoupled in nature,thus making this method suitable for parallelization.The FPM also requires an acceleration strategy to overcome the heavy computational burden of its explicit framework for time-dependent dynamic analysis.To this end,a GPU-accelerated parallel strategy for the FPM is proposed in this paper.By taking advantage of the independence of each step of the FPM workflow,a generic parallelized computational framework for multiple types of analysis is established.Using the Compute Unified Device Architecture(CUDA),the GPU implementations of the main tasks of the FPM,such as evaluating and assembling the element equivalent forces and solving the kinematic equations for particles,are elaborated through careful thread management and memory optimization.Performance tests show that speedup ratios of 8,25 and 48 are achieved for beams,hexahedral solids and triangular shells,respectively.For examples consisting of explicit dynamic analyses of shells and solids,comparisons with Abaqus using 1 to 8 CPU cores validate the accuracy of the results and demonstrate a maximum speed improvement of a factor of 11.2. 展开更多
关键词 Finite particle method GPU parallel computing explicit dynamics
下载PDF
PARALLEL FINITE ELEMENT ANALYSIS OF HIGH FREQUENCY VIBRATIONS OF QUARTZ CRYSTAL RESONATORS ON LINUX CLUSTER 被引量:4
14
作者 Ji Wang Yu Wang +3 位作者 Wenke Hu Wenhua Zhao Jianke Du Dejin Huang 《Acta Mechanica Solida Sinica》 SCIE EI 2008年第6期549-554,共6页
Quartz crystal resonators are typical piezoelectric acoustic wave devices for frequency control applications with mechanical vibration frequency at the radio-frequency (RF) range. Precise analyses of the vibration a... Quartz crystal resonators are typical piezoelectric acoustic wave devices for frequency control applications with mechanical vibration frequency at the radio-frequency (RF) range. Precise analyses of the vibration and deformation are generally required in the resonator design and improvement process. The considerations include the presence of electrodes, mountings, bias fields such as temperature, initial stresses, and acceleration. Naturally, the finite element method is the only effective tool for such a coupled problem with multi-physics nature. The main challenge is the extremely large size of resulted linear equations. For this reason, we have been employing the Mindlin plate equations to reduce the computational difficulty. In addition, we have to utilize the parallel computing techniques on Linux clusters, which are widely available for academic and industrial applications nowadays, to improve the computing efficiency. The general principle of our research is to use open source software components and public domain technology to reduce cost for developers and users on a Linux cluster. We start with a mesh generator specifically for quartz crystal resonators of rectangular and circular types, and the Mindlin plate equations are implemented for the finite element analysis. Computing techniques like parallel processing, sparse matrix handling, and the latest eigenvalue extraction package are integrated into the program. It is clear from our computation that the combination of these algorithms and methods on a cluster can meet the memory requirement and reduce computing time significantly. 展开更多
关键词 PLATE VIBRATION QUARTZ RESONATOR FEM parallel computing
下载PDF
A parallel fast multipole BEM and its applications to large-scale analysis of 3-D fiber-reinforced composites 被引量:4
15
作者 Ting Lei Zhenhan Yao Haitao Wang PengboWang 《Acta Mechanica Sinica》 SCIE EI CAS CSCD 2006年第3期225-232,共8页
In this paper, an adaptive boundary element method (BEM) is presented for solving 3-D elasticity problems. The numerical scheme is accelerated by the new version of fast multipole method (FMM) and parallelized on ... In this paper, an adaptive boundary element method (BEM) is presented for solving 3-D elasticity problems. The numerical scheme is accelerated by the new version of fast multipole method (FMM) and parallelized on distributed memory architectures. The resulting solver is applied to the study of representative volume element (RVE) for short fiberreinforced composites with complex inclusion geometry. Numerical examples performed on a 32-processor cluster show that the proposed method is both accurate and efficient, and can solve problems of large size that are challenging to existing state-of-the-art domain methods. 展开更多
关键词 Boundary element method Fast multipole method parallel computing Fiber-reinforced composites
下载PDF
Parallel Computing of a Variational Data Assimilation Model for GPS/MET Observation Using the Ray-Tracing Method 被引量:5
16
作者 张昕 刘月巍 +1 位作者 王斌 季仲贞 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2004年第2期220-226,共7页
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V... The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies. 展开更多
关键词 parallel computing variational data assimilation GPS/MET
下载PDF
New multi-DSP parallel computing architecture for real-time image processing 被引量:4
17
作者 Hu Junhong Zhang Tianxu Jiang Haoyang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第4期883-889,共7页
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present... The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment. 展开更多
关键词 parallel computing image processing REAL-TIME computer architecture
下载PDF
A UNIVERSAL ALGORITHM FOR PARALLEL CRC COMPUTATION AND ITS IMPLEMENTATION 被引量:5
18
作者 Xu Zhanqi Yi Kechu Liu Zengji 《Journal of Electronics(China)》 2006年第4期528-531,共4页
Derived from a proposed universal mathematical expression, this paper investigates a novel algo-rithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit seq... Derived from a proposed universal mathematical expression, this paper investigates a novel algo-rithm for parallel Cyclic Redundancy Check (CRC) computation, which is an iterative algorithm to update the check-bit sequence step by step and suits to various argument selections of CRC computation. The algorithm proposed is quite suitable for hardware implementation. The simulation implementation and performance analysis suggest that it could efficiently speed up the computation compared with the conventional ones. The algorithm is implemented in hardware at as high as 21Gbps, and its usefulness in high-speed CRC computa-tions is implied, such as Asynchronous Transfer Mode (ATM) networks and 10G Ethernet. 展开更多
关键词 Cyclic Redundancy Check (CRC) parallel computation Multi-bit divider
下载PDF
Parallelization and performance tuning of molecular dynamics code with OpenMP 被引量:3
19
作者 白树仁 冉丽萍 鲁奎麟 《Journal of Central South University of Technology》 2006年第3期260-264,共5页
An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main pr... An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved. 展开更多
关键词 system analysis molecular dynamics parallel computing performance tuning OPENMP
下载PDF
3D parallel inversion of time-domain airborne EM data 被引量:2
20
作者 Liu Yun-He Yin Chang-Chun +1 位作者 Ren Xiu-Yan Qiu Chang-Kai 《Applied Geophysics》 SCIE CSCD 2016年第4期701-711,740,共12页
To improve the inversion accuracy of time-domain airborne electromagnetic data, we propose a parallel 3D inversion algorithm for airborne EM data based on the direct Gauss-Newton optimization. Forward modeling is perf... To improve the inversion accuracy of time-domain airborne electromagnetic data, we propose a parallel 3D inversion algorithm for airborne EM data based on the direct Gauss-Newton optimization. Forward modeling is performed in the frequency domain based on the scattered secondary electrical field. Then, the inverse Fourier transform and convolution of the transmitting waveform are used to calculate the EM responses and the sensitivity matrix in the time domain for arbitrary transmitting waves. To optimize the computational time and memory requirements, we use the EM "footprint" concept to reduce the model size and obtain the sparse sensitivity matrix. To improve the 3D inversion, we use the OpenMP library and parallel computing. We test the proposed 3D parallel inversion code using two synthetic datasets and a field dataset. The time-domain airborne EM inversion results suggest that the proposed algorithm is effective, efficient, and practical. 展开更多
关键词 airborne EM time domain three-dimensional inversion FOOTPRINT parallel computing
下载PDF
上一页 1 2 12 下一页 到第
使用帮助 返回顶部