期刊文献+

数字信号变换函数在多簇VLIW DSP上的优化 被引量:2

Optimization of Digital Signal Transformation Functions in Multicluster VLIW DSP
下载PDF
导出
摘要 针对BWDSP100体系结构特点,基于循环展开、指令调度以及软件流水等并行优化技术,结合多簇超长指令架构的特点,通过使用超算硬件指令、零开销循环、指令重新编排与并行等方法对BWDSP100数字信号处理函数库中的函数实施并行化,并基于库中原有的顺序版本实现并行优化版本。实验结果表明,在4宏并行化模式下,所有函数加速比达到9以上,90%的函数加速比超过10,平均加速比为11.12。 According to the characteristics of BWDSP100 processor's architecture,this paper presents several practical ways to improve the performance of digital signal transformation functions in Digital Signal Processor( DSP) function library,including using special assembly instructions,instruction-level reordering,zero-overhead looping instruction,Instruction-level Parallelism( ILP),software vectorization and pipelining. It realizes parallel optimization version in library based on the original order version. Experimental results showthat,in four-macro parallel mode,all digital signal transformation functions can achieve 9x speedup,90% functions can achieve 10 x speedup,and 11. 12 x speedup is achieved on average.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第3期47-52,共6页 Computer Engineering
基金 高等学校学科创新引智计划基金资助项目(B07033) 安徽省自然科学基金资助项目"基于GPU集群的深度神经网络并行部署和优化策略研究"(1408085MKL06)
关键词 超长指令字 单指令流多数据流 数字信号处理器 循环展开 并行化 多簇 Very Long Instruction Word(VLIW) Single Instruction Multiple Data(SIMD) Digital Signal Processor(DSP) loop unrolling parallelization multicluster
  • 相关文献

参考文献16

  • 1中国电子科技集团公司第三十八研究所.BWDSP100软件用户手册[z].2014.
  • 2吴曼青.中国研发新型雷达系统打破西方技术垄断[EB/OL].[2015-02-11].http://news.xinhuanet.com/18cpcnc/2012-11/13/c_113679376.htm.
  • 3孟占红,赵保军.基于DSP的实时图像压缩软件优化技术研究[J].电子学报,2006,34(9):1558-1561. 被引量:7
  • 4李世军.JPEG图像压缩编码算法的DSP优化实现[J].微计算机信息,2012(9):193-195. 被引量:2
  • 5Kharin A, Vityazev S, Vityazev V, et al. Parallel FFT Implementation on TMS320c66x Multicore DSP I C ]// Proceedings of the 6th European Embedded Design in Education and Research Conference. Washington D. C., USA : IEEE Press ,2014:46-49.
  • 6Qian Zhihong,Cao Lei,Su Weilian,et al. Recent Advances in Computer Science and Information Engineering ~ M 1. Berlin, Germany : Springer-Verlag ,2012.
  • 7Sasanka R,Cook J J, Das A, et al. Analyzing Potential Benefits of Vectorization : USA, US20140258677 A11 P 1. 2014-09-11.
  • 8Turkington D A. Generalized Vectorization, Cross-pro- ducts, and Matrix Calculus I M 1. Cambridge, UK: Cambridge University Press ,2013.
  • 9Yang Yangzhao, Gu Naijie, Zhao Zeng, et al. IPRAR: A DFG-based Approach to Instruction Clustering for Multi-cluster VLIW DSP Processor with SIMD Structure [ J ]. Journal of Computational Information Systems, 2014, 10 (3) : 1257-1269.
  • 10Cooley J W, Tukey J W. An Algorithm for the Machine Calculation of Complexes Fourier Series [ J ]. Mathematics of Computation, 1965,19 ( 19 ) :297-301.

二级参考文献23

  • 1覃团发,秦德兴,刘运毅,张淑仪.基于TMS320C6416的宽带语音G.722.2声码器的实时实现[J].电声技术,2006,30(1):48-51. 被引量:1
  • 2Julien Reichel,Gloria Menegaz,Marcus J Nadenau,Murat Kunt.Integer wavelete transform for embedded lossy to lossless image compression[J].IEEE Transactions on Image Processing,2001,10(3):383-384.
  • 3Jian Wang,Bogong Su.A scalableloop optimization approach for scalable DSP processors[J].IEEE AS,SP[C].Istanbul,Turkey:IEEE,2000.
  • 4Lawson C L, Hanson R J, Kincaid D R, et al. Basic linear algebra subprograms for Fortran usage[J]. ACM Transactions on Mathematical Software, 1979, 5 (3) : 308-323.
  • 5Dongarra J J, Croz J D, Hammarling S, et al. An extended set of Fortran basic linear algebra subprograms[J]. ACM Transactions on Mathematical Software, 1988, 14(1): 1-17.
  • 6Dongarra J J, Croz J D, Hammarling S, et al. A set of level 3 basic linear algebra subprograms [J]. ACM Transactions on Mathematical Software, 1990, 16(1): 1-17.
  • 7Dongarra J J, Croz J D, Hammarling S, et al. A set of level 3 basic linear algebra subprograms: model implementation and test programs[J]. ACM Transactions on Mathematical Software, 1990, 16(1):18-28.
  • 8Mannheim University, University of Tennessee. Top500 [EB/OL ]. http://www.netlib.org/ benchmark/top500. html.
  • 9Chi X B, Li Y C, Sun J C, et al. Developing high performance bLAS, LAPACK & ScaLAPACK on HITACHI SRS000 [C]// Proceedings of the 4th International Conference/Exhibition on High Performance Computing in the Aisa-Pacific Region. Beijing, China: IEEE Computer Society, 2000, 2: 993-997.
  • 10Zhuo L, Prasanna V K. Design tradeoffs for BLAS operations on reconfigurable hardware [ C ]// International Conference on Parallel Processing. Oslo, Norway: IEEE Press, 2005: 78-86.

共引文献19

同被引文献8

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部