期刊文献+

支持细粒度并行性开发的多核DSP快速核间通信机制 被引量:2

Exploiting Fine-Grained Parallelism with the Fast Inter-Core Communication Mechanism on Multi-Core DSPs
下载PDF
导出
摘要 一些数字信号处理程序存在强数据相关性,在将这些数字信号处理程序划分到多核DSP上时,需要开发细粒度并行性,而细粒度并行性的开发需要快速的核间通信机制支持。本文提出了一种新的面向多核DSP的快速核间通信机制:标记式共享寄存器文件TSRF,TSRF由所有的DSP核共享,寄存器文件中的每个寄存器同一个有效标记位关联,该标记位提供了核间通信同步支持。本文构建了集成TSRF机制的多核DSP原型的周期精确模拟器,该多核DSP原型包含的处理器核数目为4个。通过详细模拟,我们使用数据相关性较强的数字信号处理算法:IIR滤波和ADPCM编解码,对TSRF机制的性能进行了测试,与单核DSP相比,TSDB机制性能提升分别为1.8、1.2和1.9左右。 Some digital signal processing algorithms have dependence between relatively very small amounts of computations, so partitioning these algorithms across multiple cores needs to exploit fine-grained parallelism. However, the capability of exploiting fine-grained parallelism is constrained by the fast inter-core communication mechanism. In this paper, we propose a new inter-core communication mechanism for multi-core DSP: TSRF (Tagged Share Register File). TSRF is shared by all DSP cores, and each register in TSRF is correlated with a tag bit which provides the low cost synchronization support for inter-core communication. We construct a cycle-accurate architecture simulator for a multi-core DSP which integrates TSRF. The number of DSP cores is four. Through detailed simulation, we evaluate the efficiency of TSRF with typical digital signal processing algorithms: IIR filter and ADPCM encoder/decoder. Compared to the result on the singlecore DSP, the multi-core DSP which integrates TSRF attains the speedup of about 1.8, 1.2 and 1.9 respectively.
作者 方兴 陈书明
出处 《计算机工程与科学》 CSCD 北大核心 2009年第4期130-133,共4页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60473079)
关键词 多核DSP 核间通信机制 细粒度并行 multi-core DSP inter-core communication mechanism fine-grained parallelism
  • 相关文献

参考文献11

  • 1Chang Jichuan,Sohi G S. Cooperative Caching for Chip Multiprocessors[J]. ACM SIGARCH Computer Architecture News, 2006,34(2) : 264-276.
  • 2Sampson J, Gonzalez R, Collard J-F, et al. Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers[C]//Proc of the 39th Annual IEEE/ACM Int'l Symp on Microarchitecture, 2006:235-246.
  • 3Banakar R, Steinke S, Lee B-S, et al. Scratchpad Memory:A Design Alternative for Cache on Chip Memory in Embedded Systems[C]//Proc of the 10th Int'l Workshop on Hardware/Software Co-Design, 2002 : 73-78.
  • 4Kandemir M, Ramanujam J, Choudhary A. Exploiting Shared Scratch Pad Memory Space in Embedded Multiprocessot Systems[C]//Proc of the 2002 Int'l Design Automation Conf, 2002 : 219-224.
  • 5Suhendra V, Raghavan C, Mitra T. Integrated Seratchpad Memory Optimization and Task Scheduling for MPSoC Architectures[C]//Proc of the Int'l Conf on Complilers, Architecture and Synthesis for Embedded System,2006:401-410.
  • 6Ozturk O, Kandemir M, Kolcu I. Shared Scratch-Pad Memory Space Management[C]//Proc of the 7th Int'l Symp on Quality Electronic Design, 2006 : 576-584.
  • 7Krashinsky R, Batten C, Hampton M, et al. The Vector Thread Architecture[C]//Proc of the 31st Int'l Symp on Computer Architecture, 2004 : 52-63.
  • 8Kumar R, Zyuban V, Tullsen D M. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Sealing[C]//Proc of the 32nd Annual Int'l Syrup on Computer Architecture, 2005 : 408-419.
  • 9Taylor M B, Lee W, Miller J, et al. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams[C]//Proc of the 31st Annual Int'l Symp on Computer Architecture, 2004 : 2-13.
  • 10陈书明,李振涛,万江华,胡定磊,郭阳,汪东,扈啸,孙书为.“银河飞腾”高性能数字信号处理器研究进展[J].计算机研究与发展,2006,43(6):993-1000. 被引量:29

二级参考文献27

  • 1胡定磊,陈书明.低功耗编译技术综述[J].电子学报,2005,33(4):676-682. 被引量:11
  • 2Jennifer Eyre, Jeff Bier. The evolution of DSP processors, http://www. BDTI. com, 2000
  • 3DSP adapt to new challenges, http://www. BDTI.com, 2003
  • 4A BDTI analysis of Texas Instrument TMS320C64x. http://www. BDTI. com, 2003
  • 5ADSP-BF531/ADSP-BF532/ADSP-BF533 Datasheet. Analog Devices, Inc. http://www. analog.com, 2004
  • 6MSC8126 Reference Guide. Freescale Semiconductor, http://www. freescale.com, 2004
  • 7Piia Simonen, Ilkka Saastamoinen, Mika Kuulusa, et al.Advanced instruction set architectures for reducing program usage in a DSP processor. The first IEEE Int'l Workshop on Electronic Design, Test and Applications, Christchurch, New Zealand, 2002
  • 8Paul M Heysters, Gerard J M Smit. Mapping of DSP algorithms on the MONTIUM architecture. In: Proc. 17th Int'l Parallel and Distributed Processing Symposium. Los Alamitos, CA: IEEE Computer Society Press, 2003
  • 9Christopher Pretty, J Geoffrey Chase. Reconfigurable DSP's for efficient MPEG-4 video and audio decoding. The First IEEE International Workshop on Electronic Design, Test and Applications, Christchurch, New Zealand, 2002
  • 10Paolo Gai, Luca Abeni, Giorgio Buttazzo. Multiprocessor DSP scheduling in system on-a-chip architectures. The 14th Euromicro Conf. Real-Time Systems, Vienna, Austria, 2002

共引文献28

同被引文献22

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部