期刊文献+
共找到12篇文章
< 1 >
每页显示 20 50 100
An MPI parallel DEM-IMB-LBM framework for simulating fluid-solid interaction problems 被引量:2
1
作者 Ming Xia Liuhong Deng +3 位作者 Fengqiang Gong Tongming Qu Y.T.Feng Jin Yu 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第6期2219-2231,共13页
The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive comp... The high-resolution DEM-IMB-LBM model can accurately describe pore-scale fluid-solid interactions,but its potential for use in geotechnical engineering analysis has not been fully unleashed due to its prohibitive computational costs.To overcome this limitation,a message passing interface(MPI)parallel DEM-IMB-LBM framework is proposed aimed at enhancing computation efficiency.This framework utilises a static domain decomposition scheme,with the entire computation domain being decomposed into multiple subdomains according to predefined processors.A detailed parallel strategy is employed for both contact detection and hydrodynamic force calculation.In particular,a particle ID re-numbering scheme is proposed to handle particle transitions across sub-domain interfaces.Two benchmarks are conducted to validate the accuracy and overall performance of the proposed framework.Subsequently,the framework is applied to simulate scenarios involving multi-particle sedimentation and submarine landslides.The numerical examples effectively demonstrate the robustness and applicability of the MPI parallel DEM-IMB-LBM framework. 展开更多
关键词 Discrete element method(DEM) Lattice Boltzmann method(LBM) Immersed moving boundary(IMB) Multi-cores parallelization message passing interface(mpi) CPU Submarine landslides
下载PDF
Development of Ubiquitous Simulation Service Structure Based on High Performance Computing Technologies 被引量:2
2
作者 Sang-Hyun CHO Jeong-Kil CHOI 《Journal of Materials Science & Technology》 SCIE EI CAS CSCD 2008年第3期374-378,共5页
The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that ... The simulation field became essential in designing or developing new casting products and in improving manufacturing processes within limited time, because it can help us to simulate the nature of processing, so that developers can make ideal casting designs. To take the prior occupation at commercial simulation market, so many development groups in the world are doing their every effort. They already reported successful stories in manufacturing fields by developing and providing the high performance simulation technologies for multipurpose. But they all run at powerful desk-side computers by well-trained experts mainly, so that it is hard to diffuse the scientific designing concept to newcomers in casting field. To overcome upcoming problems in scientific casting designs, we utilized information technologies and full-matured hardware backbones to spread out the effective and scientific casting design mind, and they all were integrated into Simulation Portal on the web. It professes scientific casting design on the NET including ubiquitous access way represented by "Anyone, Anytime, Anywhere" concept for casting designs. 展开更多
关键词 Parallel computation message passing interface mpi Shared memory processing (SMP) CLUSTERING UBIQUITOUS
下载PDF
Parallel computation of unified finite-difference time-domain for underwater sound scattering 被引量:2
3
作者 冯玉田 王朔中 《Journal of Shanghai University(English Edition)》 CAS 2008年第2期120-125,共6页
In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a mess... In this work, we treat scattering objects, water, surface and bottom in a truly unified manner in a parallel finitedifference time-domain (FDTD) scheme, which is suitable for distributed parallel computing in a message passing interface (MPI) programming environment. The algorithm is implemented on a cluster-based high performance computer system. Parallel computation is performed with different division methods in 2D and 3D situations. Based on analysis of main factors affecting the speedup rate and parallel efficiency, data communication is reduced by selecting a suitable scheme of task division. A desirable scheme is recommended, giving a higher speedup rate and better efficiency. The results indicate that the unified parallel FDTD algorithm provides a solution to the numerical computation of acoustic scattering. 展开更多
关键词 parallel computation finite-difference time-domain (FDTD) message passing interface mpi object scattering.
下载PDF
Coupling analysis of transmission lines excited by space electromagnetic fields based on time domain hybrid method using parallel technique 被引量:1
4
作者 Zhi-Hong Ye Xiao-Lin Wu Yao-Yao Li 《Chinese Physics B》 SCIE EI CAS CSCD 2020年第9期249-254,共6页
We present a time domain hybrid method to realize the fast coupling analysis of transmission lines excited by space electromagnetic fields, in which parallel finite-difference time-domain (FDTD) method, interpolation ... We present a time domain hybrid method to realize the fast coupling analysis of transmission lines excited by space electromagnetic fields, in which parallel finite-difference time-domain (FDTD) method, interpolation scheme, and Agrawal model-based transmission line (TL) equations are organically integrated together. Specifically, the Agrawal model is employed to establish the TL equations to describe the coupling effects of space electromagnetic fields on transmission lines. Then, the excitation fields functioning as distribution sources in TL equations are calculated by the parallel FDTD method through using the message passing interface (MPI) library scheme and interpolation scheme. Finally, the TL equations are discretized by the central difference scheme of FDTD and assigned to multiple processors to obtain the transient responses on the terminal loads of these lines. The significant feature of the presented method is embodied in its parallel and synchronous calculations of the space electromagnetic fields and transient responses on the lines. Numerical simulations of ambient wave acting on multi-conductor transmission lines (MTLs), which are located on the PEC ground and in the shielded cavity respectively, are implemented to verify the accuracy and efficiency of the presented method. 展开更多
关键词 Agrawal model transmission line equations parallel FDTD method message passing interface(mpi)library
下载PDF
Multi-Deme Parallel FGAs-Based Algorithm for Multitarget Tracking 被引量:1
5
作者 刘虎 朱力立 张焕春 《Journal of Electronic Science and Technology of China》 2006年第1期12-17,共6页
For data association in multisensor and multitarget tracking, a novel parallel algorithm is developed to improve the efficiency and real-time performance of FGAs-based algorithm. One Cluster of Workstation (COW) wit... For data association in multisensor and multitarget tracking, a novel parallel algorithm is developed to improve the efficiency and real-time performance of FGAs-based algorithm. One Cluster of Workstation (COW) with Message Passing Interface (MPI) is built. The proposed Multi-Deme Parallel FGA (MDPFGA) is run on the platform. A serial of special MDPFGAs are used to determine the static and the dynamic solutions of generalized m-best S-D assignment problem respectively, as well as target states estimation in track management. Such an assignment-based parallel algorithm is demonstrated on simulated passive sensor track formation and maintenance problem. While illustrating the feasibility of the proposed algorithm in multisensor multitarget tracking, simulation results indicate that the MDPFGAs-based algorithm has greater efficiency and speed than the FGAs-based algorithm. 展开更多
关键词 multitarget tracking multi-deme Fuzzy Genetic Algorithm (FGA) PARALLELIZATION message passing Interface mpi
下载PDF
An efficient parallel algorithm for ocean circulation numerical model based on irregular rectangle decomposition scheme
6
作者 ZHUANG Zhanpeng YUAN Yeli +2 位作者 ZHANG Jie HAN Lei YANG Jungang 《Acta Oceanologica Sinica》 SCIE CAS CSCD 2016年第5期18-23,共6页
A parallel algorithm of circulation numerical model based on message passing interface(MPI) is developed using serialization and an irregular rectangle decomposition scheme. Neighboring point exchange strategy(NPES... A parallel algorithm of circulation numerical model based on message passing interface(MPI) is developed using serialization and an irregular rectangle decomposition scheme. Neighboring point exchange strategy(NPES) is adopted to further enhance the computational efficiency. Two experiments are conducted on HP C7000 Blade System, the numerical results show that the parallel version with NPES(PVN) produces higher efficiency than the original parallel version(PV). The PVN achieves parallel efficiency in excess of 0.9 in the second experiment when the number of processors increases to 100, while the efficiency of PV decreases to 0.39 rapidly. The PVN of ocean circulation model is used in a fine-resolution regional simulation, which produces better results. The capability of universal implementation of this algorithm makes it applicable in many other ocean models potentially. 展开更多
关键词 irregular rectangle decomposition scheme message passing interface(mpi neighboring point exchange strategy data communication
下载PDF
High Performance MPI over the Slingshot Interconnect
7
作者 Kawthar Shafie Khorassani Chen-Chun Chen +3 位作者 Bharath Ramesh Aamir Shafi Hari Subramoni Dhabaleswar K.Panda 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第1期128-145,共18页
The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems.In particular,it is the interconnect empowering the first ex... The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems.In particular,it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world,Frontier.It offers various features such as adaptive routing,congestion control,and isolated workloads.The deployment of newer interconnects sparks interest related to performance,scalability,and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems.In this paper,we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI(message passing interface)libraries.In particular,we look at the scalability performance when using Slingshot across nodes.We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH,Open-MPI+UCX,RCCL,and MVAPICH2 on CPUs and GPUs on the Spock system,an early access cluster deployed with Slingshot-10,AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system.We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect. 展开更多
关键词 AMD GPU interconnect technology mpi(message passing interface) Slingshot
原文传递
High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments
8
作者 Azam Fazel-Najafabadi Mahdi Abbasi +5 位作者 Hani H.Attar Ayman Amer Amir Taherkordi Azad Shokrollahi Mohammad R.Khosravi Ahmed A.Solyman 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第4期1118-1137,共20页
The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific f... The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data. 展开更多
关键词 OPENMP Compute Unified Device Architecture(CUDA) message passing Interface(mpi) packet classification medical data tuple space algorithm Graphics Processing Unit(GPU)cluster
原文传递
An MPI+OpenACC-Based PRM Scalar Advection Scheme in the GRAPES Model over a Cluster with Multiple CPUs and GPUs 被引量:2
9
作者 Huadong Xiao Yang Lu +1 位作者 Jianqiang Huang Wei Xue 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期164-173,共10页
A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Reg... A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System(GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units(GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator(OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface(MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units(CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs(Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability. 展开更多
关键词 Graphics Processing Unit(GPU)computing Open Accelerator(OpenACC) message passing Interface(mpi) Global/Regional Assimilation and Prediction System(GRAPES) Piecewise Rational Method(PRM)scalar advection scheme
原文传递
GPU acceleration of a nonhydrostatic model for the internal solitary waves simulation 被引量:1
10
作者 陈同庆 张庆河 《Journal of Hydrodynamics》 SCIE EI CSCD 2013年第3期362-369,共8页
The parallel computing algorithm for a nonhydrostatic model on one or multiple Graphic Processing Units (GPUs) for the simulation of internal solitary waves is presented and discussed. The computational efficiency o... The parallel computing algorithm for a nonhydrostatic model on one or multiple Graphic Processing Units (GPUs) for the simulation of internal solitary waves is presented and discussed. The computational efficiency of the GPU scheme is analyzed by a series of numerical experiments, including an ideal case and the field scale simulations, performed on the workstation and the super- computer system. The calculated results show that the speedup of the developed GPU-based parallel computing scheme, compared to the implementation on a single CPU core, increases with the number of computational grid cells, and the speedup can increase quasi- linearly with respect to the number of involved GPUs for the problem with relatively large number of grid cells within 32 GPUs. 展开更多
关键词 Graphic Processing Unit (GPU) intemal solitary wave nonhydrostatic model SPEEDUP message passing Interface mpi
原文传递
Fast Multicast on Multistage Interconnection Networks Using Multi-Head Worms
11
作者 王晓东 徐明 周兴铭 《Journal of Computer Science & Technology》 SCIE EI CSCD 1999年第3期250-258,共9页
This paper proposes a new approach for implementing fast multicast on multistage interconnection networks (MINs) with multi-head worms. For an MIN with n stages of k×k switches, a single multi-head worm can cover... This paper proposes a new approach for implementing fast multicast on multistage interconnection networks (MINs) with multi-head worms. For an MIN with n stages of k×k switches, a single multi-head worm can cover an arbitrary set of destinations with a single communication start-up. Compared with schemes using unicast messages, this approach reduces multicast latency significantly and performs better than multi-destination worms. 展开更多
关键词 MULTICAST message passing interface (mpi) multi-head worm multistage interconnection networks (MINs) wormhole routing
原文传递
A new method to retrieve aerosol optical thickness from satellite images on a parallel system
12
作者 Jianping Guo Huadong Xiao +5 位作者 Yong Xue Huizheng Che Xiaoye Zhang Chunxiang Cao Jie Guang Hao Zhang 《Particuology》 SCIE EI CAS CSCD 2009年第5期392-398,共7页
A wide variety of algorithms have been developed to monitor aerosol burden from satellite images. Still, few solutions currently allow for real-time and efficient retrieval of aerosol optical thickness (AOT), mainly... A wide variety of algorithms have been developed to monitor aerosol burden from satellite images. Still, few solutions currently allow for real-time and efficient retrieval of aerosol optical thickness (AOT), mainly due to the extremely large volume of computation necessary for the numeric solution of atmospheric radiative transfer equations. Taking into account the efforts to exploit the SYNergy of Terra and Aqua Modis (SYNTAM, an AOT retrieval algorithm), we present in this paper a novel method to retrieve AOT from Moderate Resolution Imaging Spectroradiometer (MODIS) satellite images, in which the strategy of block partition and collective communication was taken, thereby maximizing load balance and reducing the overhead time during inter-processor communication. Experiments were carried out to retrieve AOT at 0.44, 0.55, and 0.67μm of MODIS/Terra and MODIS/Aqua data, using the parallel SYNTAM algorithm in the IBM System Cluster 1600 deployed at China Meteorological Administration (CMA). Results showed that parallel implementation can greatly reduce computation time, and thus ensure high parallel efficiency. AOT derived by parallel algorithm was validated against measurements from ground-based sun-photometers; in all cases, the relative error range was within 20%, which demonstrated that the parallel algorithm was suitable for applications such as air quality monitoring and climate modeling. 展开更多
关键词 AOT Parallel computation Block partitioning message passing Interface mpi
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部