摘要
FFT(快速傅立叶变换)是一种广泛应用于科学和工程领域的算法,现实应用中数据规模较大,需要高效实现才能满足实际应用需求。为了研究使用异构编程模型高效实现FFT算法,以华为鲲鹏处理器和昇腾AI加速芯片为实验平台,以SYCL语言为异构编程语言,实现了Cooley-Tukey基-2时域抽取FFT算法的方法和优化策略,并且提出了一种数据对切重组优化算法,大幅提高了对硬件并行能力的利用率。使用异构编程模型实现快速傅立叶变换算法可以更好地发挥异构计算设备的性能优势,易于编程且具有更高的兼容性。测试表明,在一定规模下,优化后的算法性能相比于优化前快了220.39倍。
FFT is a computationally intensive process widely used in science and engineering,and efficient implementation is necessary to meet practical application requirements.The research is aimed to investigate the efficient implementation of FFT algorithms using heterogeneous programming models.Using Huawei's Kunpeng processor and Ascend AI acceleration chip as the experimental platform and SYCL language as the heterogeneous programming language,the method and optimization strategy of the Cooley-Tukey base-2 time-domain extraction FFT algorithm are implemented,and a data tangential recombination optimization algorithm is proposed,which greatly improves the utilization rate of hardware parallel capabilities.Using heterogeneous programming models to implement fast Fourier transform algorithms can better take advantage of the performance advantages of heterogeneous computing devices,which are easy to program and have higher compatibility.Benchmarks have shown that the optimized performance is 220.39 times faster than the unoptimized version in processing large datasets.This significant improvement in speed demonstrates the effectiveness of the optimization techniques.
作者
李亚美
陈莉丽
王锋
胡畅
LI Yamei;CHEN Lili;WANG Feng;HU Chang(College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China;National Innovation Institute of Defense Technology,Academy of Military Science,Beijing 100071,China)
基金
国家重点研发计划项目(2022YFA1004303)。