期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program
1
作者 赵夏 马胜 +1 位作者 陈微 王志英 《Journal of Shanghai Jiaotong university(Science)》 EI 2016年第3期280-288,共9页
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t... The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements. 展开更多
关键词 general purpose graphics processing unit(GPGPU) MULTICORE intra-kernel inter-kernel parallel
原文传递
A multi-scale architecture for multi-scale simulation and its application to gas-solid flows 被引量:1
2
作者 Bo Li Guofeng Zhou +4 位作者 Wei Ge Limin Wang Xiaowei Wang Li Guo Jinghai Li 《Particuology》 SCIE EI CAS CSCD 2014年第4期160-169,共10页
A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpos... A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpose CPUs are employed for macro-scale control and optimization, and many integrated cores (MlCs) operating in multiple-instruction multiple-data mode are used for a molecular dynamics simulation of the solid particles at the meso-scale. Many cores operating in single-instruction multiple- data mode, such as general purpose graphics processing units (GPGPUs), are employed for direct numerical simulation of the fluid flow at the micro-scale using the lattice Boltzmann method. This architecture is also expected to be efficient for the multi-scale simulation of other comolex systems. 展开更多
关键词 general purpose graphics processing unit(GPGPU)Many integrated core (MIC)Meso-science Multiple-instruction multiple-dataSingle-instruction multiple-dataVirtual process engineering
原文传递
Optimizing non-coalesced memory access for irregular applications with GPU computing
3
作者 Ran ZHENG Yuan-dong LIU Hai JIN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第9期1285-1301,共17页
General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graph... General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graphics processing units(GPUs)are less substantial for irregular applications.In recent years,several studies have presented some solutions to remove static irregular memory access.However,eliminating dynamic irregular memory access with software remains a serious challenge.A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access,especially for indirect memory access.Data reordering and index redirection are suggested to reduce the number of memory transactions,thereby improving the performance of GPU kernels.To improve the efficiency of data reordering,an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data.Through concurrently executing the compute unified device architecture(CUDA)streams of data reordering and the data processing kernel,the overhead of data reordering can be reduced.After these optimizations,the volume of memory transactions can be reduced by 16.7%-50%compared with CUSPARSE-based benchmarks,and the performance of irregular kernels can be improved by 9.64%-34.9%using an NVIDIA Tesla P4 GPU. 展开更多
关键词 general purpose graphics processing units Memory coalescing Non-coalesced memory access Data reordering
原文传递
GPGPU Accelerated Fast Convolution Back-Projection for Radar Image Reconstruction
4
作者 周斌 彭应宁 +1 位作者 叶春茂 汤俊 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第3期256-263,共8页
This paper describes a parallel fast convolution back-projection algorithm design for radar image reconstruction. State-of-the-art general purpose graphic processing units (GPGPU) were utilized to accelerate the pro... This paper describes a parallel fast convolution back-projection algorithm design for radar image reconstruction. State-of-the-art general purpose graphic processing units (GPGPU) were utilized to accelerate the processing. The implementation achieves much better performance than conventional processing systems, with a speedup of more than 890 times on NVIDIA Tesla C1060 supercomputing cards compared to an Intel P4 2.4 GHz CPU. 256×256 pixel images could be reconstructed within 6.3 s, which makes real-time imaging possible. Six platforms were tested and compared. The results show that the GPGPU super-computing system has great potential for radar image processing. 展开更多
关键词 convolution back projection (CBP) synthetic aperture radar (SAR) inverse synthetic aperture radar (ISAR) general purpose graphic processing units (GPGPU)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部