期刊文献+

针对组相联缓存的无效缓存路访问混合过滤机制研究 被引量:2

Hybrid Approach of Filtering Unnecessary Way Accesses for Set-Associative Caches
下载PDF
导出
摘要 近年来,功耗成为处理器设计领域的关键问题之一.传统应对功耗的方法如DVFS(Dynamic VoltageFrequency Scaling)目前遭遇了收益递减律.随着多核/众核处理器的普及化,片上缓存占有了越来越多的CPU芯片面积和功耗.针对降低功耗的问题,文中提出了通过过滤不必要的缓存路访问来降低缓存动态功耗的方法.该方法包括采用无效访问过滤器(Invalid Filter)来消除对含无效数据块的缓存路的访问;采用指令数据访问过滤器(I/D Filter)来消除对与访问类型(指令或数据)不匹配的数据块所在的缓存路的访问;以及采用tag低位过滤器(Tag-2Filter)来消除对tag低位不匹配的数据块所在的缓存路的访问.文中提出将以上3种方法合并,称为Invalid+I/D+Tag-2Filter,以期取得更好的效果.通过分析和实验验证了3种方法的有效性和互补性.同时,实验也表明,与Invalid+I/D Filter相比,Invalid+I/D+Tag-2Filter在64KB 4路组相联缓存上可以取得19.6%~47.8%(平均34.3%)的效果提升,在128KB 8路组相联缓存上可以取得19.6%~55.2%(平均39.2%)的效果提升;与Invalid+Tag-2Filter相比,Invalid+I/D+Tag-2Filter在64KB 4路组相联缓存上可以取得16.1%~27.7%(平均16.6%)的效果提升,在128KB 8路组相联缓存上可以取得6.9%~44.4%(平均25.0%)的效果提升. Power has been a big issue in processor design for several years.Conventional popular approaches for addressing this issue like DVFS(Dynamic Voltage Frequency Scaling) now hit the law of diminishing returns.As multi/many-core processors becoming the main stream processors,caches account for more and more CPU die area and power,this paper presents using filtering unnecessary way accesses to reduce dynamic power consumption of caches shared by instruction and data.The methods include using Invalid Filter,which could eliminate accesses to cache ways contained invalid blocks,and I/D Filter,which could eliminate accesses to cache ways contained instruction/data access type mismatch blocks,and Tag-2 Filter,which could eliminate accesses to cache ways contained tag lowest 2 bits mismatch blocks.Since the methods reducing the activities happened in cache architecture,dynamical CPU power could be significantly decreased.In the paper,we also propose combining the above methods together,which is called Invalid+I/D+Tag-2 Filter,in an attempt to achieve better power saving results.We have verified the effectiveness and complementariness of the three proposed methods through analysis and experiments.Also,our evaluations show that,we could obtain 19.6%~47.8%(which is on average 34.3%) improvement on a 64KB-4way set-associative cache and 19.6%~55.2%(which is on average 39.2%) improvement on a 128KB-8way set-associative cache comparing to Invalid+I/D Filter,and 16.1%~27.7%(which is on average 16.6%) improvement on a 64KB-4way set-associative cache and 6.9%~44.4%(which is on average 25.0%) improvement on a 128KB-8way set-associative cache comparing to Invalid+Tag-2 Filter,respectively.
出处 《计算机学报》 EI CSCD 北大核心 2013年第4期799-808,共10页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2011CB302501) 国家杰出青年科学基金(60925009) 国家自然科学基金创新研究群体科学基金(60921002) 国家自然科学基金青年基金(61100013 61202059) 北京市科技新星计划(2010B058) 华为资助课题(YBCB2011030)资助~~
关键词 组相联缓存 动态功耗 无效访问过滤器 访问类型过滤器 tag低位过滤器 set-associative cache dynamical power invalid filter I/D filter Tag-2 filter
  • 相关文献

参考文献19

  • 1Le Sueur Etienne, Heiser Gernot. Dynamic voltage and fre-quency scaling:The laws of diminishing returns//Proceed-ings of the 2010 Workshop on Power Aware Computing andSystems. Vancouver, Canada, 2010:1-8.
  • 2Esmaeilzadeh Hadi,Blem Emily? Amant Renee St, San-karalingam Karthikeyan, Burger Doug. Dark silicon and theend of multicore scaiing//Proceedings of the 38th AnnualInternational Symposium on Computer Architecture (ISCA’ll).New York, USA, 2011:365-376.
  • 3Wulf W A, McKee S A. Hitting the memory wall:Implica-tions of the obvious. Computer Architecture News, 1995,23(1):20-24.
  • 4Wendel D, Kalla R, Cargoni R,Clables J, Friedrich J,Freeh R, Kahle J, Sinharoy B,Starke W, Taylor S,WeitzelS,Chu S G, Islam S,Zyuban V. The implementation ofPOWER7TM:A highly parallel and scalable multi-core high-end server processor//Proceedings of the IEEE InternationalSolid-State Circuits Conference Digest of Technical Papers(ISSCC,10). San Francisco, CA, USA, 2010:102-103.
  • 5Kurd N A, Bhamidipati S,Mozak C,Miller J L,Wilson TM, Nemani M, Chowdhury M. Westmere:A family of32 nm IA processors//Proceedings of the IEEE InternationalSolid-State Circuits Conference Digest of Technical Papers(1SSCC,10). San Francisco, CA, USA, 2010:96-97.
  • 6Borkar Shekhar, Chien Andrew A. The future of micropro-cessors. Communications of the ACM, 2011,54(5):67-77.
  • 7Megalingam R K, Deepu K B, Joseph I P, Vikram V.Phased set associative cache design for reduced power con-sumption/ /Proceedings of the 2nd IEEE International Con-ference on Computer Science and Information Technology(ICCSIT,09>. Beijing, China, 2009:551-556.
  • 8Inoue K,Ishihara T, Murakami K. Way-predicting set-asso-ciative cache for high performance and low energy consump-tion/ /Proceedings of the International Symposium on LowPower Electronics and Design CISLPED^ 99). San Diego,California, USA, 1999:273-275.
  • 9Ma Zhiqiang, Ji Zhenzhou, Hu Mingzeng,Ji Yi, Cao Jian-nong, Xu Ming. Energy efficient united L2 cache design withinstruction/data filter scheme//Cao J,Nejdl W,Xu M eds.Advanced Parallel Processing Technologies, 6th InternationalWorkshop, APPT 2005. Hong Kong, China. Lecture Notesin Computer Science 3756. Berlin, Heidelberg; Springer-Verlag, 2005:52-60.
  • 10Keramidas Georgios,Xekalakis Polychronis, Kaxiras Stefanos.Applying decay to reduce dynamic power in set-associativecaches//Proceedings of the 2nd International Conference onHigh Performance Embedded Architectures and Compilers(HiPEAC,07). Ghent, Belgium, 2007:38-53.

二级参考文献1

共引文献5

同被引文献12

  • 1张戈,胡伟武.高性能通用处理器中的漏电功耗优化[J].计算机学报,2006,29(10):1764-1771. 被引量:2
  • 2Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, et al. Dark silicon and the end of multicore scaling[C]. In Proceedings of the 38th annual international symposium on Computer architecture (ISCA'll). New York, 2011, 365-376.
  • 3Michael Ferdman, Almutaz Adileh, Onur Koeberber, et al. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware[C]. In Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems. 2012.
  • 4Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, et al. Scale-Out Processors. Technique report[D]. 2012.
  • 5Ma Zhiqiang, Ji Zhenzhou, Hu Mingzeng, et al. Energy Efficient United L2 Cache Design with Instruction/Data Filter Scheme[D]. Lecture Notes in Computer Science, Advanced Parallel Processing Technologies 2005.
  • 6Georgios Keramidas, Polychronis Xekalakis, Stefanos Kaxiras. Applying decay to reduce dynamic power in set-associative caches[C]. In Proceedings of the 2nd international conference on high performance embedded architectures and compilers. 2007.
  • 7S.Kaxiras, Z.Hu, M.Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power[C]. ISCA'01. Goteborg, Sweden, 2001.
  • 8D.H.Albonesi. Selective cache ways: on-demand cache resource allocation[C]. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarehitecture. 1999, 248-259.
  • 9Zhen Fang, Li Zhao, Xiaowei Jiang, et al. Reducing L1 caches power by exploiting software semantics [C]. ISLPED 2012. 2012, 391-396.
  • 10Inoue K, Ishihara T, Murakami K. Way-predicting set-associative cache for high performance and low energy consumption[C]. In Proceedings of International Symposium on Low Power Electronics and Design. 1999, 273-275.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部