期刊文献+

Cache自适应写分配策略 被引量:2

A Cache Adaptive Write Allocate Policy
下载PDF
导出
摘要 处理器所能提供的有效带宽是目前制约处理器性能提高的关键因素.通过对Cache写失效行为的分析,提出了一种新的提高处理器带宽利用率的Cache写失效处理策略——Cache自适应写分配策略.该策略在访存失效队列中收集全修改Cache块,对全修改Cache块采用非写分配策略,并能够自适应地切换为写分配策略.与传统的Cache写失效处理策略相比,Cache自适应写分配策略硬件代价小,避免了不必要的数据传输,降低Cache污染,减少存储管理队列阻塞的频率.结果表明,采用Cache自适应写分配策略,STREAM基准测试程序带宽平均提高62.6%,SPECCPU2000程序的IPC值平均提高5.9%. The bandwidth becomes the major bottleneck of the performance improvement for modern microprocessors. A cache adaptive write allocate policy that improves the bandwidth of microprocessor significantly is proposed by investigating cache store misses. The cache adaptive write allocate policy collects fully modified blocks in miss queue. Fully modified blocks are written to lower level memory based on nonwrite allocate policy which can switch to write allocate policy adaptively. Compared with other cache store miss policies, the cache adaptive write allocate policy avoids unnecessary memory traffic, reduces cache pollution and decreases load & store queue full rate without increasing hardware overhead. Experiment results indicate that on average 62.6% memory bandwidth in STREAM benchmarks is improved by utilizing the cache adaptive write allocate policy. The performance of SPEC CPU 2000 benchmarks is also improved efficiently. The average IPC speedup is 5.9 %.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第2期348-354,共7页 Journal of Computer Research and Development
基金 国家自然科学基金杰出青年基金项目(60325205) 国家"八六三"高技术研究发展计划基金项目(2002AA110010 2005AA110010 2005AA119020) 国家"九七三"重点基础研究发展规划基金项目(2005CB321601) 国家自然科学基金项目(60673146)
关键词 CACHE 写失效 写分配 带宽 龙芯2号 cache store miss write allocate bandwidth Godson-2
  • 相关文献

参考文献21

  • 1W Wulf,S McKee.Hitting the memory wall:Implications of the obvious[J].ACM Computer Architecture News,1995,23(1):20-24
  • 2Guri Sohi,Manoj Franklin.High-performance data memory systems for superscalar processors[C].The 4th Symp on Architectural Support for Programming Languages and Operating Systems,Santa Clara,California,1991
  • 3Sorin Iacobovici,Lawrence Spracklen,Sudarshan Kadambi,et al.Effective stream-based and execution-based data prefetching[C].The 18th Annual Int'l Conf on Supercomputing,Malo,France,2004
  • 4Steven P Vanderwiel,David J Lijia.Data prefetch mechanisms[J].ACM Computing Surveys(CSUR),2000,32(2):174 -199
  • 5M Lipasti,J Shen.Exceeding the dataflow limit via value prediction[C].The 29th Annual ACM/IEEE Int'l Symp on Microarchitecture,Paris,France,1996
  • 6Harold W Cain,Mikko H Lipasti.Memory ordering:A value-based approach[C].The 31st Int'l Symp on Computer Architecture (ISCA'31),Munich,Germany,2004
  • 7D Tullsen,S Eggers,H Levy.Simultaneous multithreading:Maximizing on-chip parallelism[C].The 22nd Int'l Symp on Computer Architecture (ISCA'22),Santa Margherita Ligure,Italy,1995
  • 8J Huh,D Burger,S Keckler.Exploring the design space of future CMPs[C].In:Proc of the 10th Int'l Conf on Parallel Architectures and Compilation Techniques (PACT 2001).Los Alamitos,CA:IEEE Computer Society Press,2001.199-210
  • 9Ron Kalla,Balaram Sinharoy,Joel M Tendler.IBM Power 5 Chip:A dual-core multithreaded processor[J].IEEE Micro,2004,24(2):40-47
  • 10Doug Burger,James R Goodman,Alain Kgi.A memory bandwidth limitations of future microprocessors[C].In:Proc of the 23rd Int'l Symp on Computer Architecture (ISCA'23).New York:ACM Press,1996.78-89

二级参考文献1

共引文献51

同被引文献33

  • 1马志强,季振洲,胡铭曾.基于分类访问的低功耗联合式cache方案[J].哈尔滨工程大学学报,2007,28(1):21-25. 被引量:3
  • 2郑伟,姚庆栋,张明,刘鹏,张子男,周莉,李东晓.一种低功耗Cache设计技术的研究[J].电路与系统学报,2004,9(5):21-24. 被引量:5
  • 3Banakar R, Steinke S, Lee B -S, et al. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems [C] //Proe of the 10th Int Symp on Hardware/ Software Codesign. New York: ACM, 2002:73-78.
  • 4Yoav E, Dror G. Probabilistic Prediction of Temporal Locality [J]. IEEE Computer Architecture Letters, 2007, 6 (1) : 17-20.
  • 5Vivy Suhendra, Tulika Mitra, Abhik Roy. WCET centric data allocation to seratehpad memory[C] //Proc of the 26th IEEE Int Real-Time Systems Symp. Piscataway, N J: IEEE, 2005:223-232.
  • 6Steinke S, Grunwald N, Wehmeyer L, et al. Reducing energy consumption by dynamic copying of instructions onto onchip memory [C] //Proc of the 15th Int Symp on System Synthesis. New York: ACM, 2002:213-218.
  • 7Nguyen N, Dominguez A, Barua R. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size [C] //Proc. of the 2005 Int Conf on Compilers, architectures and synthesis for embedded systems. New York: ACM, 2005:115-125.
  • 8Udayakumaran S, Barua R. Compiler decided dynamic memory allocation for scratchpad based embedded systems [C]//Proe of the 2003 Int Conf on Compilers, Architecture and Synthesis for Embedded Systems. New York: ACM, 2003:276-286.
  • 9Janapsatya A, Parameswaran S, Ignjatovic A. Hardware/ software managed seratchpad memory for embedded system [C]//Proc of the 2004 IEEE/ACM Int Conf on Computer- Aided Design. Piseataway, NJ: IEEE, 2004:370-377.
  • 10Egger B, Lee J, Shin H. Scratchpad memory management for portable systems with a memory management unit [C] // Proc of the 6th ACM & IEEE Int Conf on Embedded Software. New York: ACM, 2006:321-330.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部