期刊文献+

大点数一维FFT的GPU设计实现 被引量:4

Design and implementation of large-point 1D FFT on GPU
下载PDF
导出
摘要 鉴于GPU强大的计算性能以及先进的并行处理器架构,主要研究一种将FFT的并行算法映射到CUDA模型的并行设计方法。该设计方法遵循如减少内核函数中的全局存储器访问、全局存储器合并访问、高效利用共享存储器、高密集度计算等GPU平台下主要的设计准则进行优化设计,并在基于NVIDIA Fermi处理架构的Tesla C2075GPU平台上进行了大点数一维FFT设计实现。实验结果表明了该方法的可行性及高效性,在256K点范围内性能优于CUFFT库,加速比最高达到CUFFT 4.0库的2.1倍。 Considering the GPU's powerful computing performance and advanced parallel processor architecture, a kind of concurrent design method is studied, which maps the FFT parallel algorithm onto CUDA architecture. This method follows optimized design principles for GPU platforms, such as, re- ducing global memory access, global memory access coalescing, efficient usage of shared memory, and intensive computing. Then, a large Point 1D FFT is implemented on NVIDIA Tesla C2075 GPU based on the architecture of NVIDIA Fermi. Experimental results show that this method is superior to the CUFFT library when the number of points is not larger than 256K, and it runs two times faster than the CUFFT 4.0 library, which shows that the new method is feasible and effective.
作者 何涛 朱岱寅
出处 《计算机工程与科学》 CSCD 北大核心 2013年第11期34-41,共8页 Computer Engineering & Science
关键词 CUDA 4 0 快速傅里叶变换 GPU 高性能计算 CUDA 4.0 fast fourier transform GPU high performance computing
  • 相关文献

同被引文献47

  • 1付昕乐,王晏民,黄明.基于GPU的点云拾取[J].测绘通报,2013(S1):54-57. 被引量:4
  • 2李广鑫,丁振国,詹海生,周利华.一种面向虚拟环境的真实感水波面建模算法[J].计算机研究与发展,2004,41(9):1580-1585. 被引量:22
  • 3WANG Qiang,ZHENG Yao,CHEN Chun,FUJIMOTO Tadahiro,CHIBA Norishige.Efficient rendering of breaking waves using MPS method[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(6):1018-1025. 被引量:9
  • 4朱红斌,刘学慧,柳有权,吴恩华.基于Lattice Boltzmann模型的液-液混合流模拟[J].计算机学报,2006,29(12):2071-2079. 被引量:19
  • 5Pease M C. An adaptation of the fast Fourier transform for parallel processing[J]. Journal of the ACM, 1968, 15 (2) : 252 - 264.
  • 6Linzer E N, Feig E. Implementation of efficient FFT algorithms on fused multiply-add architectures[ J ]. IEEE Transactions on Signal Processing, 1993, 41 ( 1 ) : 93 - 107.
  • 7Goedeeker S. Fast radix 2, 3,4, and 5 kernels for fast Fourier transformations on computers with overlapping multiply-add instructions[J]. SIAM Journal on Scientific Computing, 1997, 18(6) : 1605 -1611.
  • 8Kamer H, Auer M, Ueberhuber C W. Multiply-add optimized FFT kernels[ J]. Mathematical Models and Methods in Applied Sciences, 2001, 11 ( 1 ) : 105 - 117.
  • 9Voronenko Y, Puschel M. Mechanical derivation of fused multiply-add algorithms for linear transforms [ J ]. IEEE Transactions on Signal Processing, 2007, 55 ( 9 ) : 4458 - 4473.
  • 10Frigo M, Johnson S G. BenchFFT[EB/OL]. [2014 -03 - 15 ]. http ://www. fftw. org/benchfft/.

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部