期刊文献+

基于ARM SVE的FFT算法向量化研究 被引量:3

Vectorization Study for FFT Algorithm Based on ARM SVE
下载PDF
导出
摘要 快速傅里叶变换(Fast Fourier Transform,FFT)是信号处理、图像处理等领域的重要研究工具.可伸缩向量扩展(Scalable Vector Extension,SVE)是ARM处理器推出的基于ARMv8-A体系架构的新一代SIMD指令集,支持位宽为128位-2048位的向量寄存器和向量长度无关(Vector Length Agnostic,VLA)编程模型,具有很好的数据并行性和软件可移植性,适用于高性能计算、机器学习等领域.目前基于ARM SVE的FFT算法的研究尚未充分挖掘其架构特性和计算资源,本文针对数据规模为2的幂次的一维复数FFT,结合SVE谓词驱动的循环控制、非线性访存、复数运算等特性对算法做出了改进.实验结果表明,与FFTW库基于NEON的向量化实现相比,本算法性能有明显提升,在向量长度为1024位时,平均性能提升5.83倍,最高性能提升9.22倍. Fast Fourier Transform(FFT)is animportant research tool in signal processing,image processing,etc.Scalable Vector Extension(SVE)is the next-generation SIMD instruction set for ARMv8-A architecture.It supports a vector register length between 128 and 2048 bits and Vector Length Agnostic(VLA)programming model,which allows it well vector parallelism and software portability.SVEis designed for high performance computing,machine learning,etc.The current research of FFT algorithm based on ARM SVE has not fully utilized the architecture features and computing resources.This paper focuses on 1d complex FFT of power of 2 and improves FFT algorithm based on SVE features of predicate-driven loop control,non-linear data accesses,complex operations,etc.The result shows that,this algorithm has significant performance improvement compared to FFTW library for ARMNEON.Appointing the vector length as 1024 bits,the average improvementis 5.83 times,and the highest improvement can reach to 9.22 times.
作者 李凤娇 顾乃杰 齐东升 苏俊杰 LI Feng-jiao;GU Nai-jie;QI Dong-sheng;SU Jun-jie(School of Computer Scienceand Technology,University of Science and Technology of China,Hefei 230027,China;Anhui Province Key Laboratory of Computing and Communication Software,University of Science and Technology of China,Hefei 230027,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2022年第10期2017-2021,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61573327)资助.
关键词 FFT ARM SVE SIMD汇编优化 软件性能优化 FFT ARM SVE SIMD assembly optimization software performance optimization
  • 相关文献

参考文献6

二级参考文献48

  • 1李成军,周卫峰,朱重光.基于Intel SIMD指令的二维FFT优化算法[J].计算机工程与应用,2007,43(5):41-44. 被引量:11
  • 2Williams S. Auto-tuning performance on multicore computers[D].Berkeley:University of California,Berkeley,2003.
  • 3Whaley R,Petitet A,Dongarra J. Automated empirical optimization of software and ATLAS project[J].Parallel Computing,2001,(1/2):3-35.
  • 4Bilmes J,Asanovic K,Chin C. Optimizing matrix multiply using PHiPAC:A portable,high-Performance,ANSI C coding methodology[A].New York:ACM,1997.340-347.
  • 5Frigo M,Johnson S. FFTW:An adaptive software architecture for theFFT[OL].http://www.fftw.org/fftw-paper icassp.pdf,2012.
  • 6Frigo M,Johnson S. The design and implementation of FFTW3[J].Proceeding of the IEEE,2005,(02):216-231.
  • 7Frigo M. A fast Fourier transform compiler[A].New York:ACM,1999.642-655.
  • 8Püschel M,Moura J,Johnson J. SPIRAL:Code generation for DSP transforms[J].Proceeding of the IEEE:Program Generation Optimization and Adaptation,2005,(02):232-275.
  • 9Püschel M,Franchetti F,Voronenko Y. Encyclopedia of Parallel Computing[M].Berlin:Springer-Verlag,2011.1920-1933.
  • 10Franchetti F,Püschel M,Voronenko Y. Discrete Fourier transform on multicore[J].IEEE Signal Processing Magazine:Signal Processing on Platforms with Multiple Cores,2009,(06):90-102.

共引文献28

同被引文献27

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部