众核处理器中使用写掩码实现混合写回/写穿透策略被引量：5

Using Write Mask to Support Hybrid Write-Back and Write-Through Cache Policy on Many-Core Architectures

下载PDF

导出

摘要高速缓存采用写回策略,能极大地节省对片上网络和访存带宽的消耗,这对于片上众核（大于16核）的结构尤为重要.与通常多核系统中基于目录/总线的写无效或写更新协议不同,文中给出了片上实现域一致性存储模型和基于硬件锁的缓存一致性协议的方案并提出了在L1高速缓存保存写掩码的方法,用以记录本地更新缓存块的字节位置,解决了写回策略下伪共享带来的缓存一致性问题.文中还进一步提出两种优化掩码存储空间开销的新方法：通过设定程序中较少出现的、长度为1-3字节的写指令为写穿透,在L1中每4字节设置一位写掩码,将写掩码的芯片面积开销压缩到字节粒度的27.9%;设计项数为L1缓存块总数12.5%的多路写掩码缓存,在不损失性能的情况下,将面积开销压缩到字节粒度的17.7%.搭建的众核平台Godson-T采用域一致性存储模型,使用写掩码实现混合写回/写穿透缓存策略（临界区内写穿透,临界区外写回）.实验使用splash2的3个程序和2个生物计算程序进行评估.结果表明,相对于完全写穿透,混合写回策略在32和64线程的配置下普遍获得24%以上的性能提升,性能略优于完全写回,并且采用两种优化空间开销的新方法后性能无损失. Write-back cache policy can greatly save bandwidth consumption for write operations.It′s particularly beneficial in many-core architecture.Normally CMP uses write-invalid or write-update cache protocol like directory based MESI which is hardly scalable and too complex.Alternatively the authors implemented scope consistency（and lock-based cache coherence protocol） on chip,add write-mask for each cacheline of L1 Dcache to record the written byte′s location and solve the false sharing problem.To further optimize the write-mask storage overhead,two methods are proposed.First the authors set store instructions with 1/2/3 bytes write-through property and let every 4-byte data has 1 bit write-mask.This method can compress the chip area of write-mask to 27.9% of origin byte-grain design.Secondly they design write mask buffer whose entry counts 12.5% of total number of Dcache blocks and compress the area overhead to 17.7% of origin without performance lost.On Godson-T 64-core platform which uses scope consistency,they use write-mask to implement hybrid WB/WT cache policy（in the scope range with possible data race we implement write-through,but out of the scope range without data race they choose write-back）.Three splash2 programs and two biological programs are evaluate.The results show that performance improvement is above 24% compared to completely write-through and no performance lost under the two storage optimizations.

作者林伟叶笑春宋风龙张浩

机构地区中国科学院计算技术研究所计算机体系结构重点实验室中国科学院研究生院

出处《计算机学报》 EI CSCD 北大核心 2008年第11期1918-1928,共11页 Chinese Journal of Computers

基金国家自然科学基金重点项目(60736012) 国家“九七三”重点基础研究发展规划项目基金(2005CB321600)资助.

关键词众核写掩码写掩码缓存域一致性伪共享写无效写更新 many-core write mask write mask buffer scope consistency false sharing write-invalid write-update

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献21

1Huang He, Yuan Nan, Lin Wei et al. Architecture supported synchronization-based cache coherence protocol for manycore Processors//Proceedings of the ISCA Workshop on Chip Multiprocessor Memory Systems and Interconnects. Beijing, China, 2008:51-53
2Asanovic Krste, Bodik Ras et al. The landscape of parallel computing research: A view from Berkeley. University of California, California, USA: Technical Report UCB/EECS- 2006-183, 2006
3Culler David E, Singh Jaswinder Pal, Gupta Anoop. Parallel Computer Architecture: A Hardware/Software Approach. San Fransisco, USA: Morgan Kaufmann, 1998
4胡伟武,施巍松,唐志敏.基于新型Cache一致性协议的共享虚拟存储系统[J].计算机学报,1999,22(5):467-475. 被引量：15
5Iftode Liviu, Singh Jaswinder Pal, Li Kai. Scope consistency: A bridge between release consistency and entry consistency//Proceedings of the 8th Annual ACM Symposium on Par allel Algorithms and Architectures. Padua, Italy, 1996: 277-287
6Karp A H, Sarkar Vivek. Data merging for shared-memory multiprocessors//Proceedings of the 26th Hawaii International Conference on System Sciences. Hawaii, USA, 1993: 244-256
7Lenoski Daniel, Laudon James, Joe Truman et al. The dash prototype: Implementation and performance//Proceedings of the 19th International Symposium on Computer Architecture. Queensland, Australia, 1992; 92-103
8Lamport Leslie. How to make a multiprocessor computer that correctly executes multiproeess program. IEEE Transactions on Computers, 1979, 28(9): 690-691
9Adve Sarita V, Hill Mark D. Weak ordering-A new definition//Proeeedings of the 17th International Symposium on Computer Architecture. Seattle, USA, 1990:2-14
10Gharachorloo Kourosh, Lenoski Daniel et al. Memory consistency and event ordering in scalable shared-memory multiprocessors//Proceedings of the 17th International Symposium on Computer Architecture. Seattle, USA, 1990:15-26

二级参考文献2

1胡伟武，J Comput Sci Technol，1998年，13卷，2期，110页
2Iftode L，Proc 8th Annual ACM Sympo Parallel Algorithms and Architectures，1996年，277页

共引文献14

1吴少刚,章隆兵,蔡飞,顾丽红,唐志敏.机群Open MP系统的设计与实现[J].计算机学报,2004,27(7):904-912. 被引量：8
2章隆兵,吴少刚,蔡飞,胡伟武.适合机群OpenMP系统的制导扩展[J].计算机学报,2004,27(8):1129-1136. 被引量：2
3谢青峰.浅析高速缓冲存储器Cache在PC系统中的应用[J].福建电脑,2004,20(9):27-28. 被引量：1
4姚念民,舒继武,郑纬民.SAN中的分布式锁机制[J].计算机研究与发展,2005,42(2):338-343. 被引量：1
5许建卫,陈明宇,包云岗.高带宽远程内存结构中的预取研究[J].计算机科学,2005,32(8):15-20. 被引量：2
6刘广忠,肖钰,袁淑芳.基于外部共享Cache的多处理机Cache一致性协议[J].河北工程技术高等专科学校学报,2006(2):1-3. 被引量：1
7杨学军,窦勇,胡庆丰.Progress and Challenges in High Performance Computer Technology[J].Journal of Computer Science & Technology,2006,21(5):674-681. 被引量：7
8王晓霞.放松的存储一致性模型[J].数学教学研究,2010(12):50-52.
9张骏,田泽,梅魁志,赵季中.基于节点预测的直接Cache一致性协议[J].计算机学报,2014,37(3):700-720. 被引量：33
10吴从晖,徐青,朱彩英,池天河,何建邦,陈荣国.基于软件DSMs网络系统的机载SAR图像并行处理的研究[J].计算机工程与应用,2001,37(7):82-85.

同被引文献48

1王志刚,李曦,周学海,余洁.可重定向的定制指令集处理器(ASIP)仿真技术研究[J].系统仿真学报,2007,19(6):1249-1255. 被引量：1
2Intel Inc.Intel XScale technology overview[EB/OL].[2009-07-11].http://www.intel.com/design/intelx-scale/.
3Gaisler.Leon SPARC V8 processor[EB/OL].[2009-07-11].http://www.gaisler.com.
4KANG S Y,PARK S,JUNG H Y,et al.Perform-ance tradeoffs in using NVRAM write buffer for flash memory-based storage devices[J].IEEE Transactions on Computers,2009,58(6):44-758.
5KANG W Y,SON S H,STANKOVIC J A.Power-aware data buffer cache management in real-time em-bedded databases[C] //The 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.Piscataway.NJ,USA:IEEE,2008:35-44.
6Asanovic K, Bodik R, Catanzam B C. The landscape of parallel computing research*, a view from Berkeley. http:// www. eecs. berkeley, edu/Pubs/TechRpts/2006/EECS-2006- 183. html, 2006.
7Woo S C, Ohara M, Torrie E, et al. The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, 1995. 24-36.
8Venetis I E, Gao G R. Optimizing the LU Benchmark for the Cyclops-64 Architecture. Computer Architecture and Parallel Systems Laboratory (CAPSL) Technical Memo 75, University of Delaware, Feb. 2007. 3-10.
9Petitet A, Whaley R C, Dongarra J, et al. HPL-a portable implementation of the high-performance linpack benchmark for distributed-memory computers, http://www, netlib, org/ benchmark/hpl, 2008.
10Zhang Y, Tang T, Li G, et al. Implementation and optimization of dense LU decomposition on the stream processor. In: Proceedings of the 7th International Conference Parallel Processing and Applied Mathematics, Gdansk, Poland,2007.78- 88.

引证文献5

1梅魁志,李国辉,张斌.一种面向写穿透Cache的写合并设计及验证[J].西安交通大学学报,2010,44(4):1-4. 被引量：2
2叶笑春,林伟,范东睿,张浩.蛋白质序列比对算法在众核结构上的并行优化[J].软件学报,2010,21(12):3094-3105. 被引量：3
3余磊,刘志勇,马宜科,宋风龙,徐卫志,叶笑春.众核结构上分块LU分解算法的研究[J].高技术通讯,2011,21(3):248-253.
4余磊,刘志勇,宋风龙,叶笑春.LU分解在众核结构仿真器上的指令级调度研究[J].系统仿真学报,2011,23(12):2603-2610. 被引量：5
5周琰.Godson-T缓存一致性协议的Murphi建模和验证[J].计算机系统应用,2013,22(10):124-128. 被引量：3

二级引证文献13

1朱伟成,周莉,喻庆东.一种低功耗高效率的AHB-AXI双总线结构联合Cache的IP设计[J].微电子学与计算机,2012,29(5):46-49. 被引量：1
2崔阳,吕志平,陈正生,王宇谱,吕浩.多核环境下的GNSS网平差数据并行处理研究[J].测绘学报,2013,42(5):661-667. 被引量：13
3陈正生,吕志平,崔阳,吕浩.基于BPE的GNSS数据并行快速解算[J].大地测量与地球动力学,2013,33(5):79-82. 被引量：11
4韦树烽,刘羽,蒋财运.基于GPU的遗传退火多序列比对并行研究[J].计算机工程与设计,2014,35(4):1247-1252. 被引量：3
5许瑾晨,郭绍忠,黄永忠,王磊.面向异构众核从核的数学函数库访存优化方法[J].计算机科学,2014,41(6):12-17. 被引量：6
6陈家瑞,朱文兴.一种用于并行电路仿真的电路划分算法[J].福州大学学报（自然科学版）,2014,42(4):531-536.
7孙鲁明,周琰.RCC高速缓存一致性协议的带参验证[J].计算机系统应用,2014,23(11):10-15. 被引量：1
8杨咚,钟艺,吕卫平.一种以太网数据记录微机及其应用技术[J].舰船电子工程,2015,35(2):106-110. 被引量：1
9曹燊,李勇坚.基于不变量查找的German协议验证[J].计算机系统应用,2015,24(11):173-178. 被引量：2
10朱香元,李仁发,李肯立,胡忠望.基于异构系统的生物序列比对并行处理研究进展[J].计算机科学,2015,42(B11):390-395. 被引量：1

1梁立敏.浅谈C语言教法[J].成才之路,2011(18):65-66.
2秋名86.Kaspersky病毒库——更新经验谈[J].计算机应用文摘,2004(23):72-72.
3王芝绣.把U盘“做”成电脑的锁[J].办公自动化,2005(10):41-41.
4钟元翔.我的光驱你别用——DIY光驱硬件锁[J].电脑时空,2003(10):78-79.
5郭昆,郭朝珍.基于Hibernate的GDSS数据访问层的研究[J].集美大学学报（自然科学版）,2005,10(2):180-187. 被引量：2
6高岑思.基于位图LSB算法的显式水印嵌入与提取实验平台的设计[J].科技信息,2009(18):183-183.
7王海彬.Lotus Domino／Notes的安全配置[J].网管员世界,2009(15):96-99.
8王远飞,王永强.基于Zigbee的单片机教学设计[J].电脑迷,2016(2).
9任雁.面向对象封装UDP穿透NAT技术实现[J].电子技术与软件工程,2015(8):26-28. 被引量：1
10SM.协作撰文轻松搞定[J].网友世界,2010(19):45-45.

计算机学报

2008年第11期

浏览历史

内容加载中请稍等...

众核处理器中使用写掩码实现混合写回/写穿透策略被引量：5

参考文献21

二级参考文献2

共引文献14

同被引文献48

引证文献5

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

众核处理器中使用写掩码实现混合写回/写穿透策略 被引量：5

参考文献21

二级参考文献2

共引文献14

同被引文献48

引证文献5

二级引证文献13

相关作者

相关机构

相关主题

浏览历史

众核处理器中使用写掩码实现混合写回/写穿透策略被引量：5