期刊文献+

基于FT-X DSP的二维FFT并行实现与优化研究

Two-dimensional FFT parallel implementation and optimization on FT-X DSP platform
下载PDF
导出
摘要 二维FFT是图像处理的典型算法,广泛应用于图像滤波、快速卷积、目标跟踪等领域.为满足高分辨率图像的实时处理需求,基于自主研制的FT-X众核DSP处理器,提出了一种二维FFT算法的多核并行实现方法.基于众核编程模型,通过多核任务部署、地址空间重映射等方式完成了任务初始化,实现了24核数据并行处理,加速比达到19.8倍.在此基础上,提出了基于DMA跨步传输的隐式转置方案,通过矩阵地址分配的方式,解决了大型矩阵跨步传输步长受限的问题.实验结果表明,在8 K×8 K的数据规模下,相对于直接转置和指令隐式转置分别节省了91%和65%的转置时间,同时识别并解决了某特殊情况下的多核负载不均衡的问题,将各核的用时差距从64%下降到了12%,整体用时下降了26%. Two-dimensional FFT is a typical algorithm of image processing,widely used in image filtering,fast convolution,target tracking and other fields.A parallel implementation method of 2D FFT algorithm based on the selfdeveloped FT-X many-core DSP is proposed,in order to meet the real-time processing requirements of high resolution images.Based on the multi-core programming model,the task initialization is accomplished through multi-core task deployment and address space remapping.The parallel data processing of 24 cores is realized and the speed ratio is 19.8 times.An implicit transpose based on DMA step transfer is proposed,which uses matrix address allocation to solve the problem of limited step size in large matrix step transfer.Experimental results show that compared with direct transpose and instruction implicit transpose,the transpose time is saved 91%and 65%respectively at 8Kx8K data scale.At the same time,the problem of unbalanced multi-core load in a special case is identified and solved.The difference between cores fell from 64%to 12%,and overall time fell 26%.
作者 詹逸梦 扈啸 郭阳 ZHAN Yimeng;HU Xiao;GUO Yang(College of Computer,National University of Defense Technology,Changsha 410073,Hunan,China)
出处 《微电子学与计算机》 2023年第2期71-78,共8页 Microelectronics & Computer
基金 国家科技重大专项(2017-V-0014-0066)。
关键词 二维FFT 多核并行 转置 DMA跨步传输 负载均衡 Two-dimensional FFT Multi-core parallel transpose DMA step transfer Load balancing
  • 相关文献

参考文献4

二级参考文献21

  • 1杨丽娟,张白桦,叶旭桢.快速傅里叶变换FFT及其应用[J].光电工程,2004,31(B12):1-3. 被引量:97
  • 2王旭东,刘渝.全并行结构FFT的FPGA实现[J].南京航空航天大学学报,2006,38(1):96-100. 被引量:19
  • 3邓波,戎蒙恬,汤晓峰.可配置高速高精度FFT的硬件实现[J].计算机工程,2006,32(17):254-256. 被引量:8
  • 4李伯全,胥保文,潘海彬,等.基于FPGA的FFT高速运算器设计[J].仪器仪表学报,2008,29(4):51-53.
  • 5谷获隆嗣.快速算法与并行信号处理[M].北京:科学出版社,2003.
  • 6Xilinx. LogiCORE IP Fast Fourier Transform v7,1 [ M ]. USA : Xllinx, 2010.
  • 7Xilinx. Virtex4 User Guide [ M ].USA : Xllinx, 2005.
  • 8RaoKR,KimDN,HwangJJ.快速傅里叶变换:算法与应用[M].北京:机械工业出版社,2012:1-33.
  • 9Moreland K, Angel E. The FFT on a GPU [C]// Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware, SanDiego, California, July26-27, 2003: 112-119.
  • 10Volkov V, Kazian B. Fitting FFT onto the G80 Architecture [M]. University of California, 2008, E63(40): 1-12.

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部