期刊文献+

众核处理器中使用写掩码实现混合写回/写穿透策略 被引量:5

Using Write Mask to Support Hybrid Write-Back and Write-Through Cache Policy on Many-Core Architectures
下载PDF
导出
摘要 高速缓存采用写回策略,能极大地节省对片上网络和访存带宽的消耗,这对于片上众核(大于16核)的结构尤为重要.与通常多核系统中基于目录/总线的写无效或写更新协议不同,文中给出了片上实现域一致性存储模型和基于硬件锁的缓存一致性协议的方案并提出了在L1高速缓存保存写掩码的方法,用以记录本地更新缓存块的字节位置,解决了写回策略下伪共享带来的缓存一致性问题.文中还进一步提出两种优化掩码存储空间开销的新方法:通过设定程序中较少出现的、长度为1-3字节的写指令为写穿透,在L1中每4字节设置一位写掩码,将写掩码的芯片面积开销压缩到字节粒度的27.9%;设计项数为L1缓存块总数12.5%的多路写掩码缓存,在不损失性能的情况下,将面积开销压缩到字节粒度的17.7%.搭建的众核平台Godson-T采用域一致性存储模型,使用写掩码实现混合写回/写穿透缓存策略(临界区内写穿透,临界区外写回).实验使用splash2的3个程序和2个生物计算程序进行评估.结果表明,相对于完全写穿透,混合写回策略在32和64线程的配置下普遍获得24%以上的性能提升,性能略优于完全写回,并且采用两种优化空间开销的新方法后性能无损失. Write-back cache policy can greatly save bandwidth consumption for write operations.It′s particularly beneficial in many-core architecture.Normally CMP uses write-invalid or write-update cache protocol like directory based MESI which is hardly scalable and too complex.Alternatively the authors implemented scope consistency(and lock-based cache coherence protocol) on chip,add write-mask for each cacheline of L1 Dcache to record the written byte′s location and solve the false sharing problem.To further optimize the write-mask storage overhead,two methods are proposed.First the authors set store instructions with 1/2/3 bytes write-through property and let every 4-byte data has 1 bit write-mask.This method can compress the chip area of write-mask to 27.9% of origin byte-grain design.Secondly they design write mask buffer whose entry counts 12.5% of total number of Dcache blocks and compress the area overhead to 17.7% of origin without performance lost.On Godson-T 64-core platform which uses scope consistency,they use write-mask to implement hybrid WB/WT cache policy(in the scope range with possible data race we implement write-through,but out of the scope range without data race they choose write-back).Three splash2 programs and two biological programs are evaluate.The results show that performance improvement is above 24% compared to completely write-through and no performance lost under the two storage optimizations.
出处 《计算机学报》 EI CSCD 北大核心 2008年第11期1918-1928,共11页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60736012) 国家“九七三”重点基础研究发展规划项目基金(2005CB321600)资助.
关键词 众核 写掩码 写掩码缓存 域一致性 伪共享 写无效 写更新 many-core write mask write mask buffer scope consistency false sharing write-invalid write-update
  • 相关文献

参考文献21

  • 1Huang He, Yuan Nan, Lin Wei et al. Architecture supported synchronization-based cache coherence protocol for manycore Processors//Proceedings of the ISCA Workshop on Chip Multiprocessor Memory Systems and Interconnects. Beijing, China, 2008:51-53
  • 2Asanovic Krste, Bodik Ras et al. The landscape of parallel computing research: A view from Berkeley. University of California, California, USA: Technical Report UCB/EECS- 2006-183, 2006
  • 3Culler David E, Singh Jaswinder Pal, Gupta Anoop. Parallel Computer Architecture: A Hardware/Software Approach. San Fransisco, USA: Morgan Kaufmann, 1998
  • 4胡伟武,施巍松,唐志敏.基于新型Cache一致性协议的共享虚拟存储系统[J].计算机学报,1999,22(5):467-475. 被引量:15
  • 5Iftode Liviu, Singh Jaswinder Pal, Li Kai. Scope consistency: A bridge between release consistency and entry consistency//Proceedings of the 8th Annual ACM Symposium on Par allel Algorithms and Architectures. Padua, Italy, 1996: 277-287
  • 6Karp A H, Sarkar Vivek. Data merging for shared-memory multiprocessors//Proceedings of the 26th Hawaii International Conference on System Sciences. Hawaii, USA, 1993: 244-256
  • 7Lenoski Daniel, Laudon James, Joe Truman et al. The dash prototype: Implementation and performance//Proceedings of the 19th International Symposium on Computer Architecture. Queensland, Australia, 1992; 92-103
  • 8Lamport Leslie. How to make a multiprocessor computer that correctly executes multiproeess program. IEEE Transactions on Computers, 1979, 28(9): 690-691
  • 9Adve Sarita V, Hill Mark D. Weak ordering-A new definition//Proeeedings of the 17th International Symposium on Computer Architecture. Seattle, USA, 1990:2-14
  • 10Gharachorloo Kourosh, Lenoski Daniel et al. Memory consistency and event ordering in scalable shared-memory multiprocessors//Proceedings of the 17th International Symposium on Computer Architecture. Seattle, USA, 1990:15-26

二级参考文献2

  • 1胡伟武,J Comput Sci Technol,1998年,13卷,2期,110页
  • 2Iftode L,Proc 8th Annual ACM Sympo Parallel Algorithms and Architectures,1996年,277页

共引文献14

同被引文献48

  • 1王志刚,李曦,周学海,余洁.可重定向的定制指令集处理器(ASIP)仿真技术研究[J].系统仿真学报,2007,19(6):1249-1255. 被引量:1
  • 2Intel Inc.Intel XScale technology overview[EB/OL].[2009-07-11].http://www.intel.com/design/intelx-scale/.
  • 3Gaisler.Leon SPARC V8 processor[EB/OL].[2009-07-11].http://www.gaisler.com.
  • 4KANG S Y,PARK S,JUNG H Y,et al.Perform-ance tradeoffs in using NVRAM write buffer for flash memory-based storage devices[J].IEEE Transactions on Computers,2009,58(6):44-758.
  • 5KANG W Y,SON S H,STANKOVIC J A.Power-aware data buffer cache management in real-time em-bedded databases[C] //The 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.Piscataway.NJ,USA:IEEE,2008:35-44.
  • 6Asanovic K, Bodik R, Catanzam B C. The landscape of parallel computing research*, a view from Berkeley. http:// www. eecs. berkeley, edu/Pubs/TechRpts/2006/EECS-2006- 183. html, 2006.
  • 7Woo S C, Ohara M, Torrie E, et al. The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, 1995. 24-36.
  • 8Venetis I E, Gao G R. Optimizing the LU Benchmark for the Cyclops-64 Architecture. Computer Architecture and Parallel Systems Laboratory (CAPSL) Technical Memo 75, University of Delaware, Feb. 2007. 3-10.
  • 9Petitet A, Whaley R C, Dongarra J, et al. HPL-a portable implementation of the high-performance linpack benchmark for distributed-memory computers, http://www, netlib, org/ benchmark/hpl, 2008.
  • 10Zhang Y, Tang T, Li G, et al. Implementation and optimization of dense LU decomposition on the stream processor. In: Proceedings of the 7th International Conference Parallel Processing and Applied Mathematics, Gdansk, Poland,2007.78- 88.

引证文献5

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部