期刊文献+

大规模集群上多维FFT算法的实现与优化研究 被引量:3

Implementation and Optimization of Multidimensional FFT Algorithm on Large-Scale Clusters
下载PDF
导出
摘要 快速傅里叶变换(fast Fourier transform,FFT)是用于计算离散傅里叶变换(discrete Fourier transform,DFT)或其逆运算的快速算法,在工程、科学和数学领域的应用非常广泛,例如信号分解、数字滤波、图像处理等。因此,在实际应用中对FFT算法进行细粒度优化是非常重要的。研究了FFT算法常用的分解策略以及FFT算法在大规模集群系统上的并行实现,并提出了相关的优化策略。在此基础上,对多种FFT算法在不同平台上进行了性能评估,并分析了各算法的实现、优缺点及其在大规模计算时的可扩展性。实验结果表明,相关研究有助于对现有的FFT算法进行进一步的优化,以及指导如何在大规模CPU+GPU的异构系统上根据不同需求选择实现性能更优的FFT算法。 Fast Fourier transform(FFT)is a fast algorithm which computes the discrete Fourier transform(DFT)of a sequence,or its inverse.Fast Fourier transform is widely used for many applications in engineering,science and mathematics,such as signal decomposition,digital filter and image processing.As a result,the fine-grained optimization of FFT is extremely important in application.This paper studies the typical decomposition strategies of FFT,the parallel implementation of FFT algorithms on large-scale clusters and proposes some optimization strategies.On that basis,this paper also evaluates a variety of FFT algorithms performance on different platforms,then analyzes the implementation,strength and weakness,and the computing scalability on large-scale clusters of these algorithms.The experimental results show that the research of this paper can contribute to the further optimization on existing FFT algorithms,and instruct to choose an FFT algorithm which performance is better on large-scale heterogeneous systems(CPU and GPU)in order to satisfy the specified requirements.
作者 李琨 贾海鹏 曹婷 张云泉 LI Kun;JIA Haipeng;CAO Ting;ZHANG Yunquan(State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences,Beijing 100190, China;School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100190, China)
出处 《计算机科学与探索》 CSCD 北大核心 2017年第6期863-874,共12页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61432018 61133005 61272136 61521092 61502450 国家重点研发计划No.2016YFB0200803 中国博士后科学基金No.2015T80139 国家高技术研究发展计划863计划Nos.2015AA01A303 2015AA011505 广东省科技计划项目No.2015B010108006 中科院高效空间天气预报模式科技创新交叉与合作团队项目~~
关键词 集群 快速傅里叶变换(FFT) 消息传递接口(MPI) 性能优化 cluster fast Fourier transforms (FFT) message passing interface (MPI) performance optimization
  • 相关文献

参考文献1

二级参考文献28

  • 1Duhamel P, Vetterli M. Fast fourier transforms: A tutorial review and a state of the art. Signal Processing, 1990, 9(14): 259-299.
  • 2Govindaraju N K, Lloyd B, Dotsenko Y, Smith B, Manferdelli J. High performance discrete Fourier transforms on graphics processors. In Proc. SC, Nov. 2008, Article No.2.
  • 3Nukada A, Matsuoka S. Auto-tuning 3-D FFT library for CUDA GPUs. In Proc. SC, Nov. 2009, Article No.30. Dotsenko Y, Baghsorkhi S S, Lloyd B, Govindaraju N K. Auto-tuning of fast Fourier transform on graphics processors. In Proc PPoPP, Feb. 2011, pp.257-266.
  • 4Gu L, Li X M, Siegel J. An empirically tu:ed 2D and 3D FFT library on CUDA GPU. In Proc. the 2:th ICS, June 2010, pp.305:314.
  • 5Gaster B, Howes L, Kaeli D R, Mistry P, $chaa D. Heteroge- neous Computing with OpenCL. San Fransisco, USA: Morgan Kaufmann: 2011.
  • 6Munshi A, Gaster B, Mattson T G, Fung J, Ginsburg D. OpenCL Programming Guide. Boston, USA: Addison-Wesley Professional. 2011.
  • 7Zhang E Z, Jiang Y L, Guo GPU applications on the fly: Z Y, Shen X P. Streamlining Thread divergence elimination through runtime thread-data remapping. In Proc. the 2.:th ICS, June 2010: pp.115-126.
  • 8Zhang E Z, Jiang Y L, Guo Z Y, Shen X P. Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping. In Proc. the 24th ICS, June 2010, pp.115-126.
  • 9Yang Y, Xiang P, Kong J F, Zhou H Y. A GPGPU com- piler for memory optimization and parallelism management. In Proc. PLDI, June 2010, pp.86-97.
  • 10Cooley J W, Tukey J W. An algorithm for the machine cal- culation of complex Fourier series. Mathematics of Compu- tation, 1965, 19: 297-301.

共引文献9

同被引文献35

引证文献3

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部