期刊文献+

YHFT-DX高性能DSP中Cache失效流水设计 被引量:2

Design of Cache Miss Pipelining in YHFT-DX High Performance DSP
下载PDF
导出
摘要 YHFT-DX是国防科技大学自主研制的一款高性能DSP。以提升YHFT-DX的Cache性能为目标,研究了降低Cache失效延迟的优化策略,设计并实现了一种针对高频高性能DSP的一级数据Cache优化策略——失效流水。与传统优化策略相比,该策略将连续访问Cache的失效请求并进行流水化处理,使多个Cache失效延迟重叠,从而达到降低平均Cache失效代价的目的。将该策略应用到YHFT-DX芯片的一级数据Cache控制器的设计与优化中,使访问Cache失效引起的流水线停顿从8拍降为2拍,显著提升了系统性能。 YHFT-DX is a high performance DSP designed by national university of defense technology. This paper focuses on improving Cache performance, investigates optimization methods to reduce Cache miss stall penalties, designs and implements an optimization method focusing on one level data Cache controller in high frequency and high performance DSP-miss pipelining. Compared with traditional optimization methods, this method can deal with continual cache misses in pipeline, which overlaps multi Cache miss stalls, and then it can achieve the goal of reducing Cache miss stall penalties. Applying the method to the design and optimization in one level data Cache controller in YHFT-DX DSP, the Cache miss stall is reduced from 8 cycles to 2 cycles, and the system performance is evidently improved.
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2009年第6期6-11,共6页 Journal of National University of Defense Technology
基金 国家自然科学基金资助项目(60573173) 新世纪优秀人才计划项目(NCET) 教育部"高性能微处理器技术"创新团队资助项目(IRT0614)
关键词 DSP 失效流水 非阻塞Cache 数据预取 DSP(Digital Signal Processor) miss pipelining unblocking Cache data prefetching
  • 相关文献

参考文献9

  • 1Henessy J L, Patterson D A. Computer Architecture: A Quantitative Approach[M]. Third Edition.北京:电子工业出版社,2004:257-258.
  • 2Sule A M. Design of Pipeline Fast Fourier Transform Processors Using 3 Dimensional Integrated Circuit Technology[ D ]. PHD Thesis, NCSU, 2007.
  • 3田黎育.TMS320C6000系列DSP编程工具与指南[M].北京:清华大学出版社,2006:1-676.
  • 4Henessy J L, Patterson D A. Computer Architecture: A Quantitative Approach[M]. Fourth Edition.北京:电子工业出版社,2007:412-413.
  • 5陈书明,李振涛,万江华,胡定磊,郭阳,汪东,扈啸,孙书为.“银河飞腾”高性能数字信号处理器研究进展[J].计算机研究与发展,2006,43(6):993-1000. 被引量:29
  • 6Guo Y, Chheda S, Koren I, et al. Energy-aware Data Prefetching for General-purpose Programs[C]//Processings of Power-aware Computer Systems, IEEE CS Press, Portland, USA, 2004:78-94.
  • 7黄海林,许彤,范东睿,唐志敏.嵌入式处理器中降低Cache缺失代价设计方法研究[J].小型微型计算机系统,2006,27(11):2077-2081. 被引量:3
  • 8郇丹丹,李祖松,胡伟武,刘志勇.结合访存失效队列状态的预取策略[J].计算机学报,2007,30(7):1104-1114. 被引量:3
  • 9Jouppi N P. Improving Direct-mapped Cache Performance by the Addition of a Small Fully-associative Cache and Prefetch Buffer[C]//Proc. of 17^th Annual Int'l Symposium on Computer Architecture, 1990:364 - 373.

二级参考文献37

  • 1Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量:52
  • 2胡定磊,陈书明.低功耗编译技术综述[J].电子学报,2005,33(4):676-682. 被引量:11
  • 3Jennifer Eyre, Jeff Bier. The evolution of DSP processors, http://www. BDTI. com, 2000
  • 4DSP adapt to new challenges, http://www. BDTI.com, 2003
  • 5A BDTI analysis of Texas Instrument TMS320C64x. http://www. BDTI. com, 2003
  • 6ADSP-BF531/ADSP-BF532/ADSP-BF533 Datasheet. Analog Devices, Inc. http://www. analog.com, 2004
  • 7MSC8126 Reference Guide. Freescale Semiconductor, http://www. freescale.com, 2004
  • 8Piia Simonen, Ilkka Saastamoinen, Mika Kuulusa, et al.Advanced instruction set architectures for reducing program usage in a DSP processor. The first IEEE Int'l Workshop on Electronic Design, Test and Applications, Christchurch, New Zealand, 2002
  • 9Paul M Heysters, Gerard J M Smit. Mapping of DSP algorithms on the MONTIUM architecture. In: Proc. 17th Int'l Parallel and Distributed Processing Symposium. Los Alamitos, CA: IEEE Computer Society Press, 2003
  • 10Christopher Pretty, J Geoffrey Chase. Reconfigurable DSP's for efficient MPEG-4 video and audio decoding. The First IEEE International Workshop on Electronic Design, Test and Applications, Christchurch, New Zealand, 2002

共引文献52

同被引文献30

  • 1MCKEE S A. Reflections on the memory wall [C]// Proceedings of the 1st Conference on Computing Fron- tiers. USA: ACMPress, 2004: 162-167.
  • 2ACQUAVIVE J T. Data pre{etching efficiency on two commercial systems [C]// Proceedings of the 5th Euro- pean SGI/Cray MPP Workshop. Bologna, Italy: [-s. n. ], 1999: I - 12.
  • 3TENDLER J M, DODSON J S, FIELDS J S, et al. POWER4 system microarchitecture [J]. IBM Journal of Research and Development, 2002, 46(1) : 5 - 25.
  • 4HOREL T, LAUTERBACH G. UltraSparc-III: Desig- ning third-generation 64-bit performance [J]. IEEE Mi- cro, 1999, 19(3): 73-85.
  • 5HARING R A, OHMACHT M, FOX T W, et al. The IBM Blue Gene/Q compute chip [J]. IEEE Micro, 2012, 32(2) : 48- 60.
  • 6DAMODARAN R, ANDERSON T, AGARWALA S, et al. A 1. 25GHz 0. 8W C66x DSP core in 40 nm CMOS [C] ff Proceedings of 25th International Confer- ence on VLSI Design (VLSID). India: IEEE, 2012: 286 -291.
  • 7VANDERWIEL S P ,LILJA D J. Data prefetch mecha- nisms [J]. ACM Computing Surveys (CSUR), 2000, 32 (2) : 174 - 199.
  • 8JOUPPI N P. Improving direct-mapped cache perform- ance by the addition of a small fully-associative cache and prefetch buffers[C] 1// Proceedings of 17th Annual International Symposium on Computer Architecture. Se- attle, USA: IEEE, 1990: 364-373.
  • 9FU J W C , PATEL J H. Stride directed prefetching in scalar processors [C]// Proceedings of the 25th Annual International Symposium on Microarchitecture. Portland USA: ACM Press, 1992: 102- 110.
  • 10PALACHARLA S , KESSLER R E. Evaluating stream buffers as a secondary cache replacement [C] // Proceedings of the 21st Annual International Sympo- sium on Computer Architecture. Chicago, USA: ACM Press, 1994: 24- 33.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部