摘要
快速傅里叶变换(Fast Fourier Transform,FFT)是信号处理、图像处理等领域的重要研究工具.可伸缩向量扩展(Scalable Vector Extension,SVE)是ARM处理器推出的基于ARMv8-A体系架构的新一代SIMD指令集,支持位宽为128位-2048位的向量寄存器和向量长度无关(Vector Length Agnostic,VLA)编程模型,具有很好的数据并行性和软件可移植性,适用于高性能计算、机器学习等领域.目前基于ARM SVE的FFT算法的研究尚未充分挖掘其架构特性和计算资源,本文针对数据规模为2的幂次的一维复数FFT,结合SVE谓词驱动的循环控制、非线性访存、复数运算等特性对算法做出了改进.实验结果表明,与FFTW库基于NEON的向量化实现相比,本算法性能有明显提升,在向量长度为1024位时,平均性能提升5.83倍,最高性能提升9.22倍.
Fast Fourier Transform(FFT)is animportant research tool in signal processing,image processing,etc.Scalable Vector Extension(SVE)is the next-generation SIMD instruction set for ARMv8-A architecture.It supports a vector register length between 128 and 2048 bits and Vector Length Agnostic(VLA)programming model,which allows it well vector parallelism and software portability.SVEis designed for high performance computing,machine learning,etc.The current research of FFT algorithm based on ARM SVE has not fully utilized the architecture features and computing resources.This paper focuses on 1d complex FFT of power of 2 and improves FFT algorithm based on SVE features of predicate-driven loop control,non-linear data accesses,complex operations,etc.The result shows that,this algorithm has significant performance improvement compared to FFTW library for ARMNEON.Appointing the vector length as 1024 bits,the average improvementis 5.83 times,and the highest improvement can reach to 9.22 times.
作者
李凤娇
顾乃杰
齐东升
苏俊杰
LI Feng-jiao;GU Nai-jie;QI Dong-sheng;SU Jun-jie(School of Computer Scienceand Technology,University of Science and Technology of China,Hefei 230027,China;Anhui Province Key Laboratory of Computing and Communication Software,University of Science and Technology of China,Hefei 230027,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2022年第10期2017-2021,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61573327)资助.