期刊文献+

硬件结构支持的基于同步的高速缓存一致性协议 被引量:7

Architecture Supported Synchronization-Based Cache Coherence Protocol for Many-Core Processors
下载PDF
导出
摘要 共享存储系统中如何高效地实现高速缓存一致性是体系结构设计面临的一个关键问题和难点问题.已有的基于目录的协议存在难于实现、验证复杂和存储空间开销大等问题.面向片上众核处理器,文中提出一种由硬件结构支持、基于同步的高速缓存一致性协议.该方案不使用目录,而是通过使用bloom-filter表示一致性信息,并在并行程序中的同步点维护高速缓存一致性.与现有的基于目录的高速缓存一致性协议相比,该方案可以降低目录协议的实现、验证复杂度.用SPLASH-2测试程序集评估表明,基于同步的协议可以获得与基于目录的协议相当的性能. The efficient support of cache coherence is extremely important to design and implement many-core processors. This paper proposes a synchronization-based coherence protocol to efficiently support cache coherence for shared memory of many-core processors. The unique feature of the scheme is that it doesn't use directory at all. Inspired by scope consistency memory model, the protocol maintains coherence at synchronization point. Within critical section, process cores record write sets (which lines have been written in critical sections) with bloom-filter functions. When the core releases the lock, the write set is transferred to a synchronization manager. When another core acquires the same lock, it gets the write set from the synchronization manager and invalidates stale data in its local cache. The scheme is evaluated using programs from SPLASH-2 benchmark. The results show that synchronization-based protocol can achieve similar performance in cost-effective way compared to a directory-based protocol that requires large amount of hardware resources and huge design verification effort.
出处 《计算机学报》 EI CSCD 北大核心 2009年第8期1618-1630,共13页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60736012) 国家"九七三"重点基础研究发展规划项目基金(2005CB321600)资助~~
关键词 高速缓存一致性 存储一致性模型 多核处理器 共享存储系统 cache coherence memory consistency many-core processors shared memory system
  • 相关文献

参考文献1

二级参考文献3

共引文献5

同被引文献100

  • 1Martin M M K, Sorin Multifacet's general D J, Beckmann B M, et al. execution-driven multiprocessor simulator (GEMS) toolset [J]. SIGARCH Computer Architecture News, 2005, 33(4): 92-99.
  • 2Xu M, Bodik R, Hill M D. A "flight data recorder" for enabling full system multiprocessor deterministic replay [C] //ProcoflEEEISCA'03. New York: ACM, 2003:122-135.
  • 3Fidge C J. Time stamps in message-passing systems that preserve the partial ordering [C] //Proc of ACSC'88. New York: ACM, 1988: 56-66.
  • 4Lamport L. Time, clocks, and the ordering of events in a distributed system [J]. Communications of the ACM, 1978, 21(7) : 558-565.
  • 5Bacon D F, Goldstein S C. Hardware-assisted replay of multiprocessor programs [C] //Proc of ACM/ONR WPDD'91. New York: ACM, 1991:194-206.
  • 6Xu M, Hill M D, Bodik R. A regulated transitive reduction (RTR) for longer memory race recording [C] //Proc of IEEE ASPLOS'06. New York: ACM, 2006:49-60.
  • 7Narayanasamy S, Pereira C, Calder B. Recording shared memory dependencies using strata [C] //Proc of IEEE ASPLOS'06. New York: ACM, 2006: 229-240.
  • 8Hower D R, Hill M D. Rerun: Exploiting episodes for lightweight memory race recording [C]//Proc of IEEE ISCA'08. Piscataway, NJ:IEEE, 2008: 265-276.
  • 9Montesinos P, Ceze L, Torrellas J. DeLorean: Recording and deterministically replaying shared memory multiprocessor execution efficiently [C] //Proc of IEEE ISCA'08. Piscataway, NJ: IEEE, 2008 : 289-300.
  • 10Ceze L, Tuck J, Montesinos P, et al. BulkSC: Bulk enforcement of sequential consistency [C] //Proc of IEEE ISCA'07. New York:ACM, 2007:278-289.

引证文献7

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部