期刊文献+
共找到72篇文章
< 1 2 4 >
每页显示 20 50 100
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
1
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 Parallel Computing Image processing OPENMP Parallel Programming High Performance Computing GPU (Graphic processing Unit)
下载PDF
Volumetric lattice Boltzmann method for pore-scale mass diffusionadvection process in geopolymer porous structures
2
作者 Xiaoyu Zhang Zirui Mao +6 位作者 Floyd W.Hilty Yulan Li Agnes Grandjean Robert Montgomery Hans-Conrad zur Loye Huidan Yu Shenyang Hu 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第6期2126-2136,共11页
Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advecti... Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advection process within porous structures is essential for material design.In this study,we present advancements in the volumetric lattice Boltzmann method(VLBM)for modeling and simulating pore-scale diffusion-advection of radioactive isotopes within geopolymer porous structures.These structures are created using the phase field method(PFM)to precisely control pore architectures.In our VLBM approach,we introduce a concentration field of an isotope seamlessly coupled with the velocity field and solve it by the time evolution of its particle population function.To address the computational intensity inherent in the coupled lattice Boltzmann equations for velocity and concentration fields,we implement graphics processing unit(GPU)parallelization.Validation of the developed model involves examining the flow and diffusion fields in porous structures.Remarkably,good agreement is observed for both the velocity field from VLBM and multiphysics object-oriented simulation environment(MOOSE),and the concentration field from VLBM and the finite difference method(FDM).Furthermore,we investigate the effects of background flow,species diffusivity,and porosity on the diffusion-advection behavior by varying the background flow velocity,diffusion coefficient,and pore volume fraction,respectively.Notably,all three parameters exert an influence on the diffusion-advection process.Increased background flow and diffusivity markedly accelerate the process due to increased advection intensity and enhanced diffusion capability,respectively.Conversely,increasing the porosity has a less significant effect,causing a slight slowdown of the diffusion-advection process due to the expanded pore volume.This comprehensive parametric study provides valuable insights into the kinetics of isotope uptake in porous structures,facilitating the development of porous materials for nuclear waste treatment applications. 展开更多
关键词 Volumetric lattice Boltzmann method(VLBM) Phase field method(PFM) Pore-scale diffusion-advection Nuclear waste treatment Porous media flow Graphics processing unit(GPU) PARALLELIZATION
下载PDF
Optimization of a precise integration method for seismic modeling based on graphic processing unit 被引量:2
3
作者 Jingyu Li Genyang Tang Tianyue Hu 《Earthquake Science》 CSCD 2010年第4期387-393,共7页
General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has ... General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has a huge quantity of data and calculation steps. In this study, we introduce a GPU-based parallel calculation method of a precise integration method (PIM) for seismic forward modeling. Compared with CPU single-core calculation, GPU parallel calculating perfectly keeps the features of PIM, which has small bandwidth, high accuracy and capability of modeling complex substructures, and GPU calculation brings high computational efficiency, which means that high-performing GPU parallel calculation can make seismic forward modeling closer to real seismic records. 展开更多
关键词 precise integration method seismic modeling general purpose GPU graphic processing unit
下载PDF
TIME-DOMAIN INTERPOLATION ON GRAPHICS PROCESSING UNIT 被引量:1
4
作者 XIQI LI GUOHUA SHI YUDONG ZHANG 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2011年第1期89-95,共7页
The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get ... The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get better signal-to-noise ratio(SNR)but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zeropadding interpolation method.Additionally,the resampled data can be obtained by a few data and coefficients in the cutoff window.Thus,a lot of interpolations can be performed simultaneously.So,this interpolation method is suitable for parallel computing.By using graphics processing unit(GPU)and the compute unified device architecture(CUDA)program model,time-domain interpolation can be accelerated significantly.The computing capability can be achieved more than 250,000 A-lines,200,000 A-lines,and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L=11,L=21,and L=31,respectively.A frame SD-OCT data(400A-lines×2,048 pixel per line)is acquired and processed on GPU in real time.The results show that signal processing time of SD-OCT can befinished in 6.223 ms when the cutoff length L=21,which is much faster than that on central processing unit(CPU).Real-time signal processing of acquired data can be realized. 展开更多
关键词 Optical coherence tomography real-time signal processing graphics processing unit GPU CUDA
下载PDF
The inversion of density structure by graphic processing unit(GPU) and identification of igneous rocks in Xisha area 被引量:1
5
作者 Lei Yu Jian Zhang +2 位作者 Wei Lin Rongqiang Wei Shiguo Wu 《Earthquake Science》 2014年第1期117-125,共9页
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig... Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration. 展开更多
关键词 Xisha area Organic reefs and igneous rocks -Frequency decomposition of potential field 3D inversionof the graphic processing unit (GPU) parallel processing
下载PDF
Simulation of fluid-structure interaction in a microchannel using the lattice Boltzmann method and size-dependent beam element on a graphics processing unit
6
作者 Vahid Esfahanian Esmaeil Dehdashti Amir Mehdi Dehrouye-Semnani 《Chinese Physics B》 SCIE EI CAS CSCD 2014年第8期389-395,共7页
Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The b... Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The bottom boundary of the microchannel is simulated by size-dependent beam elements for the finite element method (FEM) based on a modified cou- ple stress theory. The lattice Boltzmann method (LBM) using the D2Q13 LB model is coupled to the FEM in order to solve the fluid part of the FSI problem. Because of the fact that the LBM generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallel computing. The simulations are carried out on graphics processing units (GPUs) using computed unified device architecture (CUDA). In the present study, the governing equations are non-dimensionalized and the set of dimensionless groups is exhibited to show their effects on micro-beam displacement. The numerical results show that the displacements of the micro-beam predicted by the size-dependent beam element are smaller than those by the classical beam element. 展开更多
关键词 fluid-structure interaction graphics processing unit lattice Boltzmann method size-dependentbeam element
下载PDF
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
7
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit (GPU) computing unified device architecture (CUDA)
下载PDF
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
8
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(GPU) GPU parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
9
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(GPU) compute unified device architecture (CUDA) neural network species recognition
下载PDF
The Design of a Graphical User Environment for Numerical Simulation of Powder Forming Processes
10
作者 A R Khoei S Keshavarz 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2002年第S1期-,共2页
As computer simulation increasingly supports engine er ing design and manufacture, the requirement for a computer software environment providing an integration platform for computational engineering software increas e... As computer simulation increasingly supports engine er ing design and manufacture, the requirement for a computer software environment providing an integration platform for computational engineering software increas es. A key component of an integrated environment is the use of computational eng ineering to assist and support solutions for complex design. Computer methods fo r structural, flow and thermal analysis are well developed and have been used in design for many years. Many software packages are now available which provi de an advanced capability. However, they are not designed for modelling of powde r forming processes. This paper describes the powder compaction software (PCS_SU T), which is designed for pre- and post-processing for computational simulatio n of the process compaction of powder. In the PCS_SUT software, the adaptive analysis of transient metal powder forming process is simulated by the finite element method based on deformation theories . The error estimates and adaptive remeshing schemes are applied for updated co -ordinate analysis. A generalized Newmark scheme is used for the time domain di scretization and the final nonlinear equations are solved by a Newton-Raphson p rocedure. An incremental elasto-plastic material model is used to simulate the compaction process. To describe the constitutive model of nonlinear behaviour of powder materials, a combination of Mohr-Coulomb and elliptical yield cap model is applied. This model reflects the yielding, frictional and densification char acteristics of powder along with strain and geometrical hardening which occurs d uring the compaction process. A hardening rule is used to define the dependence of the yield surface on the degree of plastic straining. A plasticity theory for friction is employed in the treatment of the powder-tooling interface. The inv olvement of two different materials, which have contact and relative movement in relation to each other, must be considered. A special formulation for friction modelling is coupled with a material formulation. The interface behaviour betwee n the die and the powder is modelled by using an interface element mesh. In the present paper, we have demonstrated pre- and post-processor finite elem ent software, written in Visual Basic, to generate the graphical model and visua lly display the computed results. The software consist of three main part: · Pre-processor: It is used to create the model, generate an app ropriate finite element grid, apply the appropriate boundary conditions, and vie w the total model. The geometric model can be used to associate the mesh with th e physical attributes such as element properties, material properties, or loads and boundary conditions. · Analysis: It can deal with two-dimensional and axi-symmetric applications for linear and non-linear behaviour of material in static and dyna mic analyses. Both triangular and quadrilateral elements are available in the e lement library, including 3-noded, 6-noded and 7-noded (T6B1) triangles and 4 -noded, 8-noded and 9-noded quadrilaterals. The direct implicit algorithm bas ed on the generalized Newmark scheme is used for the time integration and an aut omatic time step control facility is provided. For non-linear iteration, choice s among fully or modified Newton-Raphson method and quasi-Newton method, using the initial stiffness method, Davidon inverse method or BFGS inverse method, ar e possible. · Post-processor: It provides visualization of the computed resu lts, when the finite element model and analysis have been completed. Post-proce ssing is vital to allow the appropriate interpretation of the completed results of the finite element analysis. It provides the visual means to interpret the va st amounts of computed results generated. Finally, the powder behaviour during the compaction of a multi-level component is numerically simulated by the PCS_SUT software, as shown in Fig.1. The predict ive compaction forces at different displacements are computed and compared with the available experimental 展开更多
关键词 The Design of a graphical User Environment for Numerical Simulation of Powder Forming Processes
下载PDF
混沌线程池与GPU优化的批量图像加密算法
11
作者 潘明华 王一涵 +1 位作者 谷盛民 孙绍华 《科学技术与工程》 北大核心 2023年第34期14618-14626,共9页
数据量大且冗余度高是数字图像显著的特征,这对大批量图像快速实时加密提出了挑战。为了解决此问题,基于Lorenz混沌加密技术,设计了一种采用线程池与图形处理器(graphics processing unit,GPU)组合优化的批量图像加密算法。该算法通过... 数据量大且冗余度高是数字图像显著的特征,这对大批量图像快速实时加密提出了挑战。为了解决此问题,基于Lorenz混沌加密技术,设计了一种采用线程池与图形处理器(graphics processing unit,GPU)组合优化的批量图像加密算法。该算法通过线程池改进图像的读写,并进行图像镜像变换;利用Lorenz混沌系统生成加密序列,结合图像分块混沌序列进行加密;然后对批量图像数据进行打包,通过GPU进行大批量的异步计算;最后重组图像矩阵得到批量加密图像。实验测试表明,该算法能够有效抵御常见的攻击手段,经过性能优化后的批量数字图像加密算法,可以保证图像安全性;同时,在批量图像读取速率和加解密处理效率方面有显著的提高。 展开更多
关键词 图像加密 混沌系统 并行计算 线程池 图形处理器(graphics processing unit GPU)
下载PDF
Study on the particle breakage of ballast based on a GPU accelerated discrete element method 被引量:3
12
作者 Guang-Yu Liu Wen-Jie Xu +1 位作者 Qi-Cheng Sun Nicolin Govender 《Geoscience Frontiers》 SCIE CAS CSCD 2020年第2期461-471,共11页
Breakage of particles will have greatly influence on mechanical behavior of granular material(GM)under external loads,such as ballast,rockfill and sand.The discrete element method(DEM)is one of the most popular method... Breakage of particles will have greatly influence on mechanical behavior of granular material(GM)under external loads,such as ballast,rockfill and sand.The discrete element method(DEM)is one of the most popular methods for simulating GM as each particle is represented on its own.To study breakage mechanism of particle breakage,a cohesive contact mode is developed based on the GPU accelerated DEM code-Blaze-DEM.A database of the 3D geometry model of rock blocks is established based on the 3D scanning method.And an agglomerate describing the rock block with a series of non-overlapping spherical particles is used to build the DEM numerical model of a railway ballast sample,which is used to the DEM oedometric test to study the particles’breakage characteristics of the sample under external load.Furthermore,to obtain the meso-mechanical parameters used in DEM,a black-analysis method is used based on the laboratory tests of the rock sample.Based on the DEM numerical tests,the particle breakage process and mechanisms of the railway ballast are studied.All results show that the developed code can better used for large scale simulation of the particle breakage analysis of granular material. 展开更多
关键词 Discrete element method(DEM) Particle breakage graphical processing unit(GPU) Railway ballast Granular material(GM)
下载PDF
Stability analysis for flow past a cylinder via lattice Boltzmann method and dynamic mode decomposition 被引量:2
13
作者 张伟 王勇 钱跃竑 《Chinese Physics B》 SCIE EI CAS CSCD 2015年第6期378-384,共7页
A combination of the lattice Boltzmann method and the most recently developed dynamic mode decomposition is proposed for stability analysis. The simulations are performed on a graphical processing unit. Stability of t... A combination of the lattice Boltzmann method and the most recently developed dynamic mode decomposition is proposed for stability analysis. The simulations are performed on a graphical processing unit. Stability of the flow past a cylinder at supercritical state, Re = 50, is studied by the combination for both the exponential growing and the limit cycle regimes. The Ritz values, energy spectrum, and modes for both regimes are presented and compared with the Koopman eigenvalues. For harmonic-like periodic flow in the limit cycle, global analysis from the combination gives the same results as those from the Koopman analysis. For transient flow as in the exponential growth regime, the combination can provide more reasonable results. It is demonstrated that the combination of the lattice Boltzmann method and the dynamic mode decomposition is powerful and can be used for stability analysis for more complex flows. 展开更多
关键词 lattice Boltzmann dynamic mode decomposition stability analysis graphical processing unit
下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm
14
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method Parallel computation graphical processing unit(GPU)
下载PDF
An enhanced GPU reduction at the warp-level
15
作者 Hou Neng He Fazhi Zhou Yi 《Computer Aided Drafting,Design and Manufacturing》 2016年第2期43-52,共10页
In recent years, graphical processing unit (GPU)-accelerated intelligent algorithms have been widely utilized for solving combination optimization problems, which are NP-hard, These intelligent algorithms involves a... In recent years, graphical processing unit (GPU)-accelerated intelligent algorithms have been widely utilized for solving combination optimization problems, which are NP-hard, These intelligent algorithms involves a common operation, namely reduction, in which the best suitable candidate solution in the neighborhood is selected. As one of the main procedures, it is necessary to optimize the reduction on the GPU. In this paper, we propose an enhanced warp-based reduction on the GPU. Compared with existing block-based reduction methods, our method exploit efficiently the potential of implementation at warp level, which better matches the characteristics of current GPU architecture. Firstly, in order to improve the global memory access performance, the vectoring accessing is utilized. Secondly, at the level of thread block reduction, an enhanced warp-based reduction on the shared memory are presented to form partial results. Thirdly, for the configuration of the number of thread blocks, the number of thread blocks can be obtained by maximizing the size of thread block and the maximum size of threads per stream multi-processor on GPU. Finally, the proposed method is evaluated on three generations of NVIDIA GPUs with the better performances than previous methods. 展开更多
关键词 REDUCTION graphical processing unit computing unified device architecture warp-level reduction
下载PDF
Implementing Delay Multiply and Sum Beamformer on a Hybrid CPU-GPU Platform for Medical Ultrasound Imaging Using Open MP and CUDA 被引量:2
16
作者 Ke Song Paul Liu Dongquan Liu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第9期1133-1150,共18页
Anovel beamforming algorithmnamed Delay Multiply and Sum(DMAS),which excels at enhancing the resolution and contrast of ultrasonic image,has recently been proposed.However,there are nested loops in this algorithm,so t... Anovel beamforming algorithmnamed Delay Multiply and Sum(DMAS),which excels at enhancing the resolution and contrast of ultrasonic image,has recently been proposed.However,there are nested loops in this algorithm,so the calculation complexity is higher compared to the Delay and Sum(DAS)beamformer which is widely used in industry.Thus,we proposed a simple vector-based method to lower its complexity.The key point is to transform the nested loops into several vector operations,which can be efficiently implemented on many parallel platforms,such as Graphics Processing Units(GPUs),and multi-core Central Processing Units(CPUs).Consequently,we considered to implement this algorithm on such a platform.In order to maximize the use of computing power,we use the GPUs andmulti-core CPUs inmixture.The platform used in our test is a low cost Personal Computer(PC),where a GPU and a multi-core CPU are installed.The results show that the hybrid use of a CPU and a GPU can get a significant performance improvement in comparison with using a GPU or using amulti-core CPU alone.The performance of the hybrid system is increased by about 47%–63%compared to a single GPU.When 32 elements are used in receiving,the fame rate basically can reach 30 fps.In the best case,the frame rate can be increased to 40 fps. 展开更多
关键词 BEAMFORMING delay multiply and sum graphics processing unit multi-core central processing unit
下载PDF
GPU-ACCELERATED FEM SOLVER FOR THREE DIMENSIONAL ELECTROMAGNETIC ANALYSIS 被引量:2
17
作者 Tian Jin Gong Li +1 位作者 Shi Xiaowei Le Xu 《Journal of Electronics(China)》 2011年第4期615-622,共8页
A new Graphics Processing Unit(GPU) parallelization strategy is proposed to accelerate sparse finite element computation for three dimensional electromagnetic analysis.The parallelization strategy is employed based on... A new Graphics Processing Unit(GPU) parallelization strategy is proposed to accelerate sparse finite element computation for three dimensional electromagnetic analysis.The parallelization strategy is employed based on a new compression format called sliced ELL Four(sliced ELL-F).The sliced ELL-F format-based parallelization strategy is designed for hastening many addition,dot product,and Sparse Matrix Vector Product(SMVP) operations in the Conjugate Gradient Norm(CGN) calculation of finite element equations.The new implementation of SMVP on GPUs is evaluated.The proposed strategy executed on a GPU can efficiently solve sparse finite element equations,espe-cially when the equations are huge sparse(size of most rows in a coefficient matrix is less than 8).Numerical results show the sliced ELL-F format-based parallelization strategy can reach signi?cant speedups compared to Compressed Sparse Row(CSR) format. 展开更多
关键词 Finite Element Method(FEM) Graphics processing Unit(GPU) Parallelization strategy Conjugate Gradient Norm(CGN) Sliced ELL Four(sliced ELL-F)
下载PDF
High-throughput volumetric reconstruction for 3D wheat plant architecture studies 被引量:1
18
作者 Wei Fang Hui Feng +4 位作者 Wanneng Yang Lingfeng Duan Guoxing Chen Lizhong Xiong Qian Liu 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2016年第5期101-113,共13页
For many tller crops,the plant archit ecture(PA),including the plant fresh weight,plant height,number of tllrs,tller angle and stem diameter,sigificantly afects the grain yield.In this study,we propose a method based ... For many tller crops,the plant archit ecture(PA),including the plant fresh weight,plant height,number of tllrs,tller angle and stem diameter,sigificantly afects the grain yield.In this study,we propose a method based on volumetric reconstruction for high-throughput three-dimensional(3D)wheat PA studies.The proposed methodology involves plant volumetric reconst ruction from multiple images,plant model processing and phenotypic parameter estimation and analysis.This study was performed on 80 Triticum aestium plants,and the results were analyzed.Comparing the automated measurements with manual measurements,the mean absolute per-centage error(MAPE)in the plant height and the plant fresh weight was 2.71%(1.08cm with an average plant height of 40.07cm)and 10.06%(1.41g with an average plant fresh weight of 14.06 g),respectively.The root mean square error(RMSE)was 137 cm and 1.79g for the plant height and plant fresh weight,respectively.The correlation cofficients were 0.95 and 0.96 for the plant height and plant fresh weight,respectively.Additionally,the proposed methodology,in-cluding plant reconstruction,model processing and trait ext raction,required only approximately 20s on average per plant using parallel computing on a graphics processing unit(GPU),dem-onstrating that the methodology would be valuable for a high-throughput phenotyping platform. 展开更多
关键词 THREE-DIMENSIONAL volumetric reconstruction plant architecture graphics processing unit HIGH-THROUGHPUT
下载PDF
A GPU-Based Parallel Algorithm for 2D Large Deformation Contact Problems Using the Finite Particle Method 被引量:1
19
作者 Wei Wang Yanfeng Zheng +2 位作者 Jingzhe Tang Chao Yang Yaozhi Luo 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第11期595-626,共32页
Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation fr... Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method. 展开更多
关键词 Finite particle method graphics processing unit(GPU) parallel computing contact algorithm LARGE
下载PDF
On-line Free-viewpoint Video:From Single to Multiple View Rendering
20
作者 Vincent Nozick Hideo Saito 《International Journal of Automation and computing》 EI 2008年第3期257-267,共11页
In recent years, many image-based rendering techniques have advanced from static to dynamic scenes and thus become video-based rendering (VBR) methods. But actually, only a few of them can render new views on-line. ... In recent years, many image-based rendering techniques have advanced from static to dynamic scenes and thus become video-based rendering (VBR) methods. But actually, only a few of them can render new views on-line. We present a new VBR system that creates new views of a live dynamic scene. This system provides high quality images and does not require any background subtraction. Our method follows a plane-sweep approach and reaches real-time rendering using consumer graphic hardware, graphics processing unit (GPU). Only one computer is used for both acquisition and rendering. The video stream acquisition is performed by at least 3 webcams. We propose an additional video stream management that extends the number of webcams to 10 or more. These considerations make our system low-cost and hence accessible for everyone. We also present an adaptation of our plane-sweep method to create simultaneously multiple views of the scene in real-time. Our system is especially designed for stereovision using autostereoscopic displays. The new views are computed from 4 webcams connected to a computer and are compressed in order to be transfered to a mobile phone. Using GPU programming, our method provides up to 16 images of the scene in real-time. The use of both GPU and CPU makes this method work on only one consumer grade computer. 展开更多
关键词 Video-based rendering (VBR) free-viewpoint video view interpolation graphics processing unit (GPU) WEBCAM STEREOVISION autostereoscopic.
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部