An FFT Performance Model for Optimizing General-Purpose Processor Architecture

An FFT Performance Model for Optimizing General-Purpose Processor Architecture

导出

摘要 General-purpose processor （GPP） is an important platform for fast Fourier transform （FFT）,due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures （including Intel Core i7 and Godson-3B） validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well. General-purpose processor （GPP） is an important platform for fast Fourier transform （FFT）,due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures （including Intel Core i7 and Godson-3B） validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well.

作者李玲陈云霁刘道福钱诚胡伟武

机构地区 Institute of Computing Technology Loongson Technologies Corporation Limited

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第5期875-889,共15页 计算机科学技术学报（英文版）

基金 supported by the National Science and Technology Major Project under Grant Nos.2009ZX01028-002-003,2009ZX01029-001-003,2010ZX01036-001-002 the National Natural Science Foundation of China under Grant Nos.61050002,61003064,60921002

关键词 fast Fourier transform （FFT） general-purpose processor （GPP） performance prediction model vector unit DMA fast Fourier transform （FFT） general-purpose processor （GPP） performance prediction model vector unit DMA

分类号 TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献27

1Frigo M, Johnson S. The design and implementation of FFTW3. Proceedings of the IEEE, Feb. 2005, 93(2): 216- 231.
2Franchetti F, Piischel M, Voronenko Y, Chellappa S, Moura J M F. Discrete Fourier transform on multicore. IEEE Signal Processing Magazine, 2009, 26(6): 90-102.
3Li Y, Zhao L, Lin H, Chow A C, Diamond J R. A performance model for fast Fourier transform. In Proc. the 23rd International Symposium on Parallel and Distributed Processing, Rome, Italy, May 23-29, 2009, pp.1-11.
4Fraguela B B, Voronenko Y, Puschel M. Automatic tuning of discrete Fourier transforms driven by analytical modeling. In Proc. the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT2009). Raleigh, USA, Sept. 12-16, 2009, pp.271-280.
5Norton A, Silberger A J. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared-memory architectures. IEEE Transactions on Computers, 1987, C-36(5): 581-591.
6Cvetanovic Z. Performance analysis of the FFT algorithm on a shaved-memory parallel architecture. IBM Journal of Research and Development, 1987, 31(4): 435-451.
7Gu L, Li X. DFT performance prediction in FFTW. In Proc. the 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC), Newark, USA, Oct. 8-10, 2009.
8Pagiamtzis K, Kulak P G. Empirical performance prediction for IFFT/FFT cores for OFDM systems-on-a-chip. In Proc. the 45th MWSCAS, Tulsa, USA, Aug. 4-7, 2002.
9Singer B, Veloso M. Learning to construct fast signal processing implementations. Journal of Machine Learning Research, 2003, 3: 887-919.
10Sepiashvili D. Performance models and search methods for optimal FFT implementations. [Master's Thesis]. Carnegie Mellon University, 2000.

1GPRS:创建移动的信息社会[J].广播电视信息,2001,8(7):58-61.
2杨旭霞,张毅,张鹏.基于FFT分析振动信号的机械故障诊断[J].山西电子技术,2015(2):6-7. 被引量：5
3陈铭,陈俊.基于单片机AT89C51的数据采集系统设计[J].中国水运（下半月）,2008,8(10):132-133.
4范明强.柴油机用燃烧压力传感器[J].汽车与配件,2006(13):25-25.
5黄天祺.GPU通用计算的发展及其应用领域综述[J].计算机光盘软件与应用,2011(8):36-36. 被引量：1
6Series6XT：GPU架构[J].世界电子元器件,2014(3):26-26.
7蒋青松,谢庆年,汤春球.基于LabVIEW的旋转机械阶比分析[J].武汉理工大学学报（信息与管理工程版）,2011,33(3):418-420. 被引量：2
8马秀领,田书林.基于QDRII的一种PXI任意波形发生器数字部分的设计[J].电子质量,2009(6):27-29.
9陶小峰,侯延昭,王凯栋,何海洋,郭英杰.GPP-Based Soft Base Station Designing and Optimization[J].Journal of Computer Science & Technology,2013,28(3):420-428. 被引量：1
10田黎育,孙密,万阳良.Implementation of a kind of FPGA-based binary phase coded radar signal processor architecture[J].Journal of Beijing Institute of Technology,2012,21(4):526-531. 被引量：1

Journal of Computer Science & Technology

2011年第5期

浏览历史

内容加载中请稍等...

An FFT Performance Model for Optimizing General-Purpose Processor Architecture

参考文献27

相关作者

相关机构

相关主题

浏览历史