Cache自适应写分配策略被引量：2

A Cache Adaptive Write Allocate Policy

下载PDF

导出

摘要处理器所能提供的有效带宽是目前制约处理器性能提高的关键因素.通过对Cache写失效行为的分析,提出了一种新的提高处理器带宽利用率的Cache写失效处理策略——Cache自适应写分配策略.该策略在访存失效队列中收集全修改Cache块,对全修改Cache块采用非写分配策略,并能够自适应地切换为写分配策略.与传统的Cache写失效处理策略相比,Cache自适应写分配策略硬件代价小,避免了不必要的数据传输,降低Cache污染,减少存储管理队列阻塞的频率.结果表明,采用Cache自适应写分配策略,STREAM基准测试程序带宽平均提高62.6%,SPECCPU2000程序的IPC值平均提高5.9%. The bandwidth becomes the major bottleneck of the performance improvement for modern microprocessors. A cache adaptive write allocate policy that improves the bandwidth of microprocessor significantly is proposed by investigating cache store misses. The cache adaptive write allocate policy collects fully modified blocks in miss queue. Fully modified blocks are written to lower level memory based on nonwrite allocate policy which can switch to write allocate policy adaptively. Compared with other cache store miss policies, the cache adaptive write allocate policy avoids unnecessary memory traffic, reduces cache pollution and decreases load ＆ store queue full rate without increasing hardware overhead. Experiment results indicate that on average 62.6% memory bandwidth in STREAM benchmarks is improved by utilizing the cache adaptive write allocate policy. The performance of SPEC CPU 2000 benchmarks is also improved efficiently. The average IPC speedup is 5.9 %.

作者郇丹丹李祖松胡伟武刘志勇

机构地区中国科学院计算技术研究所计算机系统结构重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2007年第2期348-354,共7页 Journal of Computer Research and Development

基金国家自然科学基金杰出青年基金项目(60325205) 国家"八六三"高技术研究发展计划基金项目(2002AA110010 2005AA110010 2005AA119020) 国家"九七三"重点基础研究发展规划基金项目(2005CB321601) 国家自然科学基金项目(60673146)

关键词 CACHE 写失效写分配带宽龙芯2号 cache store miss write allocate bandwidth Godson-2

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献21

1W Wulf,S McKee.Hitting the memory wall:Implications of the obvious[J].ACM Computer Architecture News,1995,23(1):20-24
2Guri Sohi,Manoj Franklin.High-performance data memory systems for superscalar processors[C].The 4th Symp on Architectural Support for Programming Languages and Operating Systems,Santa Clara,California,1991
3Sorin Iacobovici,Lawrence Spracklen,Sudarshan Kadambi,et al.Effective stream-based and execution-based data prefetching[C].The 18th Annual Int'l Conf on Supercomputing,Malo,France,2004
4Steven P Vanderwiel,David J Lijia.Data prefetch mechanisms[J].ACM Computing Surveys(CSUR),2000,32(2):174 -199
5M Lipasti,J Shen.Exceeding the dataflow limit via value prediction[C].The 29th Annual ACM/IEEE Int'l Symp on Microarchitecture,Paris,France,1996
6Harold W Cain,Mikko H Lipasti.Memory ordering:A value-based approach[C].The 31st Int'l Symp on Computer Architecture (ISCA'31),Munich,Germany,2004
7D Tullsen,S Eggers,H Levy.Simultaneous multithreading:Maximizing on-chip parallelism[C].The 22nd Int'l Symp on Computer Architecture (ISCA'22),Santa Margherita Ligure,Italy,1995
8J Huh,D Burger,S Keckler.Exploring the design space of future CMPs[C].In:Proc of the 10th Int'l Conf on Parallel Architectures and Compilation Techniques (PACT 2001).Los Alamitos,CA:IEEE Computer Society Press,2001.199-210
9Ron Kalla,Balaram Sinharoy,Joel M Tendler.IBM Power 5 Chip:A dual-core multithreaded processor[J].IEEE Micro,2004,24(2):40-47
10Doug Burger,James R Goodman,Alain Kgi.A memory bandwidth limitations of future microprocessors[C].In:Proc of the 23rd Int'l Symp on Computer Architecture (ISCA'23).New York:ACM Press,1996.78-89

二级参考文献1

1胡伟武,唐志敏.龙芯1号处理器结构设计[J].计算机学报,2003,26(4):385-396. 被引量：53

共引文献51

1蔡嵩松,刘奇,沈海华,章隆兵.跨平台系统级虚拟机的访存优化[J].计算机研究与发展,2012,49(S1):131-136. 被引量：2
2邱吉,高翔,彭飞,汪文祥,蒋毅飞.基于二进制插桩的ASIP处理器指令集混合仿真方法[J].计算机研究与发展,2012,49(S1):330-335.
3张戈,齐子初,胡伟武.龙芯2号处理器功能部件设计[J].计算机研究与发展,2006,43(6):967-973. 被引量：1
4胡伟武,侯锐,肖俊华,章隆宾.High Performance General-Purpose Microprocessors： Past and Future[J].Journal of Computer Science & Technology,2006,21(5):631-640. 被引量：5
5张福新,章隆兵,胡伟武.基于SimpleScalar的龙芯CPU模拟器Sim-Godson[J].计算机学报,2007,30(1):68-73. 被引量：25
6胡伟武,赵继业,钟石强,杨旭,Elio Guidetti,吴永强.Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology[J].Journal of Computer Science & Technology,2007,22(1):1-14. 被引量：14
7郇丹丹,李祖松,王剑,章隆兵,胡伟武,刘志勇.快速地址计算的自适应栈高速缓存[J].计算机研究与发展,2007,44(1):169-176. 被引量：1
8汤彦,张福新,唐志敏.基于程序周期行为的快速模拟方法[J].计算机工程,2007,33(7):65-67. 被引量：1
9黄琨,章隆兵,胡伟武,张戈.一种基于龙芯CPU的结构级功耗评估新方法[J].计算机研究与发展,2007,44(5):782-789. 被引量：4
10李祖松,许先超,胡伟武,唐志敏.龙芯2号同时多线程处理器的软硬件接口设计[J].软件学报,2007,18(7):1806-1817. 被引量：2

同被引文献33

1马志强,季振洲,胡铭曾.基于分类访问的低功耗联合式cache方案[J].哈尔滨工程大学学报,2007,28(1):21-25. 被引量：3
2郑伟,姚庆栋,张明,刘鹏,张子男,周莉,李东晓.一种低功耗Cache设计技术的研究[J].电路与系统学报,2004,9(5):21-24. 被引量：5
3Banakar R, Steinke S, Lee B -S, et al. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems [C] //Proe of the 10th Int Symp on Hardware/ Software Codesign. New York: ACM, 2002:73-78.
4Yoav E, Dror G. Probabilistic Prediction of Temporal Locality [J]. IEEE Computer Architecture Letters, 2007, 6 (1) : 17-20.
5Vivy Suhendra, Tulika Mitra, Abhik Roy. WCET centric data allocation to seratehpad memory[C] //Proc of the 26th IEEE Int Real-Time Systems Symp. Piscataway, N J: IEEE, 2005:223-232.
6Steinke S, Grunwald N, Wehmeyer L, et al. Reducing energy consumption by dynamic copying of instructions onto onchip memory [C] //Proc of the 15th Int Symp on System Synthesis. New York: ACM, 2002:213-218.
7Nguyen N, Dominguez A, Barua R. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size [C] //Proc. of the 2005 Int Conf on Compilers, architectures and synthesis for embedded systems. New York: ACM, 2005:115-125.
8Udayakumaran S, Barua R. Compiler decided dynamic memory allocation for scratchpad based embedded systems [C]//Proe of the 2003 Int Conf on Compilers, Architecture and Synthesis for Embedded Systems. New York: ACM, 2003:276-286.
9Janapsatya A, Parameswaran S, Ignjatovic A. Hardware/ software managed seratchpad memory for embedded system [C]//Proc of the 2004 IEEE/ACM Int Conf on Computer- Aided Design. Piseataway, NJ: IEEE, 2004:370-377.
10Egger B, Lee J, Shin H. Scratchpad memory management for portable systems with a memory management unit [C] // Proc of the 6th ACM & IEEE Int Conf on Embedded Software. New York: ACM, 2006:321-330.

引证文献2

1邓宁,计卫星,石峰,宋红.一种基于随机采样的SPM管理机制[J].计算机研究与发展,2011,48(5):897-905. 被引量：1
2朱伟成,周莉,喻庆东.一种低功耗高效率的AHB-AXI双总线结构联合Cache的IP设计[J].微电子学与计算机,2012,29(5):46-49. 被引量：1

二级引证文献2

1李嘉欣,邓宁.一种基于访问计数的SPM管理策略[J].计算机工程,2013,39(9):109-113. 被引量：1
2李泉泉,张铁军,王东辉,侯朝焕.基于分支执行历史的循环缓冲低功耗方法[J].微电子学与计算机,2014,31(9):7-10.

1何伟,李红莲,袁保宗,林碧琴.基于对话回合衰减的cache语言模型在线自适应研究[J].中文信息学报,2003,17(5):41-47. 被引量：1
2郇丹丹,李祖松,胡伟武,刘志勇.结合访存失效队列状态的预取策略[J].计算机学报,2007,30(7):1104-1114. 被引量：3
3张骏,樊晓桠,刘松鹤.面向CMP体系结构的二级CACHE替换算法设计[J].小型微型计算机系统,2007,28(12):2277-2281.
4吴俊杰,潘晓辉,杨学军.面向非一致Cache的智能多跳提升技术[J].计算机学报,2009,32(10):1887-1895. 被引量：4
5张骏,梅魁志,赵季中.基于置信度评估的Cache污染过滤技术[J].高技术通讯,2011,21(6):644-651. 被引量：1
6王强,陆阳,吴雷,魏臻.包含错误恢复的软件可靠性仿真研究[J].系统仿真学报,2013,25(5):887-893.
7张丹青,江建慧,陈林博.一种对程序故障行为和失效行为的聚类有效性验证方法[J].中国科学：信息科学,2014,44(10):1323-1344. 被引量：3
8张骏.面向微处理器猜测执行过程中预载入数据的Cache污染控制方法[J].小型微型计算机系统,2012,33(5):987-994.
9张延松,王占伟,孙妍,王珊.内存数据库可控的page-color优化技术研究[J].计算机研究与发展,2011,48(S3):95-104. 被引量：1
10任飞,王念秋,段翰聪.大规模分布式存储系统中数据修复策略的研究[J].互联网天地,2013(2):7-12.

计算机研究与发展

2007年第2期

浏览历史

内容加载中请稍等...

Cache自适应写分配策略被引量：2

参考文献21

二级参考文献1

共引文献51

同被引文献33

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

Cache自适应写分配策略 被引量：2

参考文献21

二级参考文献1

共引文献51

同被引文献33

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

Cache自适应写分配策略被引量：2