针对组相联缓存的无效缓存路访问混合过滤机制研究被引量：2

Hybrid Approach of Filtering Unnecessary Way Accesses for Set-Associative Caches

下载PDF

导出

摘要近年来,功耗成为处理器设计领域的关键问题之一.传统应对功耗的方法如DVFS(Dynamic VoltageFrequency Scaling)目前遭遇了收益递减律.随着多核/众核处理器的普及化,片上缓存占有了越来越多的CPU芯片面积和功耗.针对降低功耗的问题,文中提出了通过过滤不必要的缓存路访问来降低缓存动态功耗的方法.该方法包括采用无效访问过滤器(Invalid Filter)来消除对含无效数据块的缓存路的访问;采用指令数据访问过滤器(I/D Filter)来消除对与访问类型(指令或数据)不匹配的数据块所在的缓存路的访问;以及采用tag低位过滤器(Tag-2Filter)来消除对tag低位不匹配的数据块所在的缓存路的访问.文中提出将以上3种方法合并,称为Invalid+I/D+Tag-2Filter,以期取得更好的效果.通过分析和实验验证了3种方法的有效性和互补性.同时,实验也表明,与Invalid+I/D Filter相比,Invalid+I/D+Tag-2Filter在64KB 4路组相联缓存上可以取得19.6%～47.8%(平均34.3%)的效果提升,在128KB 8路组相联缓存上可以取得19.6%～55.2%(平均39.2%)的效果提升;与Invalid+Tag-2Filter相比,Invalid+I/D+Tag-2Filter在64KB 4路组相联缓存上可以取得16.1%～27.7%(平均16.6%)的效果提升,在128KB 8路组相联缓存上可以取得6.9%～44.4%(平均25.0%)的效果提升. Power has been a big issue in processor design for several years.Conventional popular approaches for addressing this issue like DVFS（Dynamic Voltage Frequency Scaling） now hit the law of diminishing returns.As multi/many-core processors becoming the main stream processors,caches account for more and more CPU die area and power,this paper presents using filtering unnecessary way accesses to reduce dynamic power consumption of caches shared by instruction and data.The methods include using Invalid Filter,which could eliminate accesses to cache ways contained invalid blocks,and I/D Filter,which could eliminate accesses to cache ways contained instruction/data access type mismatch blocks,and Tag-2 Filter,which could eliminate accesses to cache ways contained tag lowest 2 bits mismatch blocks.Since the methods reducing the activities happened in cache architecture,dynamical CPU power could be significantly decreased.In the paper,we also propose combining the above methods together,which is called Invalid＋I/D＋Tag-2 Filter,in an attempt to achieve better power saving results.We have verified the effectiveness and complementariness of the three proposed methods through analysis and experiments.Also,our evaluations show that,we could obtain 19.6%~47.8%（which is on average 34.3%） improvement on a 64KB-4way set-associative cache and 19.6%~55.2%（which is on average 39.2%） improvement on a 128KB-8way set-associative cache comparing to Invalid＋I/D Filter,and 16.1%~27.7%（which is on average 16.6%） improvement on a 64KB-4way set-associative cache and 6.9%~44.4%（which is on average 25.0%） improvement on a 128KB-8way set-associative cache comparing to Invalid＋Tag-2 Filter,respectively.

作者范灵俊徐远超施巍松范东睿娄杰

机构地区中国科学院计算技术研究所计算机体系结构国家重点实验室中国科学院大学首都师范大学信息工程学院韦恩州立大学计算机科学系英特尔公司

出处《计算机学报》 EI CSCD 北大核心 2013年第4期799-808,共10页 Chinese Journal of Computers

基金国家"九七三"重点基础研究发展规划项目基金(2011CB302501) 国家杰出青年科学基金(60925009) 国家自然科学基金创新研究群体科学基金(60921002) 国家自然科学基金青年基金(61100013 61202059) 北京市科技新星计划(2010B058) 华为资助课题(YBCB2011030)资助~~

关键词组相联缓存动态功耗无效访问过滤器访问类型过滤器 tag低位过滤器 set-associative cache dynamical power invalid filter I/D filter Tag-2 filter

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献19

1Le Sueur Etienne, Heiser Gernot. Dynamic voltage and fre-quency scaling:The laws of diminishing returns//Proceed-ings of the 2010 Workshop on Power Aware Computing andSystems. Vancouver, Canada, 2010:1-8.
2Esmaeilzadeh Hadi,Blem Emily? Amant Renee St, San-karalingam Karthikeyan, Burger Doug. Dark silicon and theend of multicore scaiing//Proceedings of the 38th AnnualInternational Symposium on Computer Architecture (ISCA’ll).New York, USA, 2011:365-376.
3Wulf W A, McKee S A. Hitting the memory wall:Implica-tions of the obvious. Computer Architecture News, 1995,23(1):20-24.
4Wendel D, Kalla R, Cargoni R,Clables J, Friedrich J,Freeh R, Kahle J, Sinharoy B,Starke W, Taylor S,WeitzelS,Chu S G, Islam S,Zyuban V. The implementation ofPOWER7TM:A highly parallel and scalable multi-core high-end server processor//Proceedings of the IEEE InternationalSolid-State Circuits Conference Digest of Technical Papers(ISSCC,10). San Francisco, CA, USA, 2010:102-103.
5Kurd N A, Bhamidipati S,Mozak C,Miller J L,Wilson TM, Nemani M, Chowdhury M. Westmere:A family of32 nm IA processors//Proceedings of the IEEE InternationalSolid-State Circuits Conference Digest of Technical Papers(1SSCC,10). San Francisco, CA, USA, 2010:96-97.
6Borkar Shekhar, Chien Andrew A. The future of micropro-cessors. Communications of the ACM, 2011,54(5):67-77.
7Megalingam R K, Deepu K B, Joseph I P, Vikram V.Phased set associative cache design for reduced power con-sumption/ /Proceedings of the 2nd IEEE International Con-ference on Computer Science and Information Technology(ICCSIT,09>. Beijing, China, 2009:551-556.
8Inoue K,Ishihara T, Murakami K. Way-predicting set-asso-ciative cache for high performance and low energy consump-tion/ /Proceedings of the International Symposium on LowPower Electronics and Design CISLPED^ 99). San Diego,California, USA, 1999:273-275.
9Ma Zhiqiang, Ji Zhenzhou, Hu Mingzeng,Ji Yi, Cao Jian-nong, Xu Ming. Energy efficient united L2 cache design withinstruction/data filter scheme//Cao J,Nejdl W,Xu M eds.Advanced Parallel Processing Technologies, 6th InternationalWorkshop, APPT 2005. Hong Kong, China. Lecture Notesin Computer Science 3756. Berlin, Heidelberg; Springer-Verlag, 2005:52-60.
10Keramidas Georgios,Xekalakis Polychronis, Kaxiras Stefanos.Applying decay to reduce dynamic power in set-associativecaches//Proceedings of the 2nd International Conference onHigh Performance Embedded Architectures and Compilers(HiPEAC,07). Ghent, Belgium, 2007:38-53.

二级参考文献1

1Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量：52

共引文献5

1史莉雯,樊晓桠,陈杰,黄小平,郑乔石.程序行为分析指导TLB低功耗设计[J].计算机科学,2011,38(5):301-304. 被引量：1
2杨澜,赵祥模,惠飞,史昕,张建阳.基于FPGA的双轴倾角计信号提取方法研究[J].计算机应用与软件,2012,29(4):89-93.
3范灵俊,唐士斌,张轮凯,郑亚松,张浩.一种带有无效缓存路访问过滤机制的低功耗高速缓存[J].小型微型计算机系统,2012,33(10):2231-2236.
4肖建青,娄冕,张洵颖,沈绪榜.一种低功耗的多端口寄存器文件结构设计[J].中南大学学报（自然科学版）,2015,46(8):2914-2922.
5王聪,程耀东,阚文枭,杜然,徐琪,陈刚.多核平台上的BESⅢ离线数据并行处理[J].计算机应用,2015,35(A02):41-43.

同被引文献12

1张戈,胡伟武.高性能通用处理器中的漏电功耗优化[J].计算机学报,2006,29(10):1764-1771. 被引量：2
2Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, et al. Dark silicon and the end of multicore scaling[C]. In Proceedings of the 38th annual international symposium on Computer architecture (ISCA'll). New York, 2011, 365-376.
3Michael Ferdman, Almutaz Adileh, Onur Koeberber, et al. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware[C]. In Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems. 2012.
4Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, et al. Scale-Out Processors. Technique report[D]. 2012.
5Ma Zhiqiang, Ji Zhenzhou, Hu Mingzeng, et al. Energy Efficient United L2 Cache Design with Instruction/Data Filter Scheme[D]. Lecture Notes in Computer Science, Advanced Parallel Processing Technologies 2005.
6Georgios Keramidas, Polychronis Xekalakis, Stefanos Kaxiras. Applying decay to reduce dynamic power in set-associative caches[C]. In Proceedings of the 2nd international conference on high performance embedded architectures and compilers. 2007.
7S.Kaxiras, Z.Hu, M.Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power[C]. ISCA'01. Goteborg, Sweden, 2001.
8D.H.Albonesi. Selective cache ways: on-demand cache resource allocation[C]. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarehitecture. 1999, 248-259.
9Zhen Fang, Li Zhao, Xiaowei Jiang, et al. Reducing L1 caches power by exploiting software semantics [C]. ISLPED 2012. 2012, 391-396.
10Inoue K, Ishihara T, Murakami K. Way-predicting set-associative cache for high performance and low energy consumption[C]. In Proceedings of International Symposium on Low Power Electronics and Design. 1999, 273-275.

引证文献2

1章铁飞,陈天洲,吴剑钟.基于程序访存模式的低功耗存储技术[J].软件学报,2014,25(2):254-266. 被引量：6
2范灵俊,徐远超,唐士斌,杜坤,王达.多线程共享缓存中冗余路访问消除机制研究[J].高性能计算技术,2014,0(1):20-24.

二级引证文献6

1熊永华,张因升,陈鑫,吴敏.云视频监控系统的能耗优化研究[J].软件学报,2015,26(3):680-698. 被引量：22
2黄智濒,周锋,马华东,祝明发,陶袁.利用堆栈特征的片上末级缓存访问模式在线识别方法[J].国防科技大学学报,2015,37(1):1-7. 被引量：1
3姚英彪,陈越佳.基于SRAM和PRAM混合主存设计[J].计算机工程与应用,2016,52(13):69-75.
4吴国望,侯平,刘宏波,杨尧尧.海上自救装置的设计[J].电子设计工程,2017,25(13):156-158. 被引量：1
5王晨曦,吕方,崔慧敏,曹婷,John Zigman,庄良吉,冯晓兵.面向大数据处理的基于Spark的异质内存编程框架[J].计算机研究与发展,2018,55(2):246-264. 被引量：9
6王朝闻,蒋林,李远成,朱筠.基于TVM平台的MEC卷积算法优化[J].计算机工程与应用,2023,59(1):180-186.

1范灵俊,徐远超,唐士斌,杜坤,王达.多线程共享缓存中冗余路访问消除机制研究[J].高性能计算技术,2014,0(1):20-24.
2范灵俊,唐士斌,张轮凯,郑亚松,张浩.一种带有无效缓存路访问过滤机制的低功耗高速缓存[J].小型微型计算机系统,2012,33(10):2231-2236.
3林颖,郭锋.基于P2P的混合过滤文章推荐系统[J].心智与计算,2007,0(4):442-447.
4贾宝锋,高德远,丁双喜.低功耗动态可配置Cache设计[J].计算机测量与控制,2008,16(7):1017-1020. 被引量：2
5周乐.Visual Basic使用技巧小集锦[J].软件世界,1995(12):27-29.
6大虾门诊[J].计算机与网络,2006,32(12):62-63.
7余静,段隆振,熊必成.基于Agent的信息推送系统的研究[J].计算机与现代化,2007(10):18-21. 被引量：11
8杨君,李曦,仲力,周学海.一种新型的嵌入式X路组相联cache结构[J].中国科学技术大学学报,2007,37(2):153-158. 被引量：1
9崔昌栋,鞠大鹏,李兆麟.采用路选择技术实现的低功耗高速缓存设计[J].清华大学学报（自然科学版）,2007,47(1):116-118. 被引量：1
10俞剑.安装及使用光驱需注意的两问题[J].电脑爱好者,1996(8):34-34.

计算机学报

2013年第4期

浏览历史

内容加载中请稍等...

针对组相联缓存的无效缓存路访问混合过滤机制研究被引量：2

参考文献19

二级参考文献1

共引文献5

同被引文献12

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

针对组相联缓存的无效缓存路访问混合过滤机制研究 被引量：2

参考文献19

二级参考文献1

共引文献5

同被引文献12

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

针对组相联缓存的无效缓存路访问混合过滤机制研究被引量：2