期刊文献+

MPD:结点具有多个并行缓存一致性域的CC-NUMA系统

MPD:A CC-NUMA System with Clump Having Multiple Parallel Cache Coherency Domains
下载PDF
导出
摘要 大规模高速缓存一致性非均匀存储访问(cache coherence non-uniform memory access,CC-NUMA)系统通常采用两级一致性域方法来降低缓存一致性协议维护开销,提升系统性能.两级一致性域系统中,多个处理器互连,形成结点内一致性域;多个结点互连,形成结点间一致性域.然而,受限于处理器直连能力与处理器可识别ID数,系统的单结点规模有限,系统规模的扩展不得不依靠增加结点数来实现,使得大规模CC-NUMA系统的结点间互连复杂度上升,跨结点访问带宽和延迟急剧增长,影响了系统性能的有效扩展.MPD系统通过在结点内构建多个并行缓存一致性域,突破了处理器直连能力与可识别ID数对单结点规模的限制,能够大幅减少结点数量,并将部分结点间访问转化为结点内访问,实现系统性能的有效扩展.理论分析和实验结果表明:采用同规格处理器的32路系统中,结点内4个并行缓存一致性域的MPD系统可实现结点数目减少75%、一致性目录存储开销节省40%以上、平均访问延迟降低约27.9%、系统整体性能提升约14.4%. Large-scale CC-NUMA system usually employs two-tier architecture to reduce the overhead of cache coherence and enhance the performance of system.In a two-tier system,various processors and a coherence chip are located in an intra-clump cache coherency domain,and various coherence chips are interconnected by a system interconnection network so as to form an inter-clump cache coherency domain.Since every processor occupies at least one processor ID number in the cache coherency domain,and the number of processor ID numbers that can be distinguished by every processor is limited,CC-NUMA system expands the scale only by increasing the number of clumps,not by increasing the scale of clump.This leads to the over-large number of clumps and complicated topology structure in a multi-processor system,thereby increasing the bandwidth and latency of cross-clump memory access.To solve this problem,we propose a new method to construct multi-processor system,called MPD,in which a clump has multiple parallel cache coherency domains.This method solves the problem of limited clump scale brought about by limited number of processor supportable by a processor in a domain.Compared with traditional CC-NUMA system,MPD system not only significantly reduces the system topological complexity,but also effectively improves the system performance.Theoretical analysis and simulation results show:compared with32-way CC-NUMA system,MPD system constructed by same processors can achieve75%reduction in the number of nodes,more than40%savings in consistency directory storage,27.9%average reduction in access latency and about14.4%improvement in system performance.
作者 陈继承 赵雅倩 李一韩 王恩东 史宏志 唐士斌 Chen Jicheng;Zhao Yaqian;Li Yihan;Wang Endong;Shi Hongzhi;Tang Shibin(State Key Laboratory of High-End Server&Storage Technology (Inspur Group Company himited ) , Beijing 100085)
出处 《计算机研究与发展》 EI CSCD 北大核心 2017年第4期775-786,共12页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2013AA011701)~~
关键词 CC-NUMA系统 两级一致性域 并行缓存一致性域 一致性协同芯片 系统可扩展性 CC-NUMA (cache coherence non-uniform memory access) system two-tier architecture multiple parallel cache coherency domain (MPD) coherence chip (CC) system scalability
  • 相关文献

参考文献3

二级参考文献44

  • 1Laudon J, Lenoski D. The SGI Origin : a ccNUMA highly scalable server. In: Proceedings of the ACM 24th Annual International Symposium on Computer Architecture, New York, USA, 1997. 241-251.
  • 2Gostin G, Collard J-F, Collins K. The architecture of HP superdome shared-memory muhiprocessor. In : Proceedings of the ACM 19th Annual International Conference on Su- percomputing, New York, USA ,2005. 239-245.
  • 3Aono F, Kimura M. The AZUSA 16-way Itanium server. IEEE Micro ,2000,20(5 ) :54-60.
  • 4Gharachorlooy K, Sharma M, Steely S, et al. Architecture and design of AlphaServer GS320. ACM Sigplan Notices, 2000,35( 11 ) :13-24.
  • 5Conway P, Hughes B. The AMD Opteron northbridge ar- chitecture. IEEE Micro ,2007,27 (2) : 10-21.
  • 6Feehrer J, Rotker P, Shih M, et al. Coherency hub design for muhisoeket SUN servers with coohhreads technology. IEEE Micro ,2009,29(4 ) :36-47.
  • 7Kota R, Oehler R. HORUS : Large-scale symmetric muhi- processing for Opteron system. IEEE Micro ,2005,25 ( 2 ) : 30-40.
  • 8Aeacio M. E, Gonzalez J, Garcia J. M, et al. A two-level directory architecture for highly sealable ce-NUMA muhi- processors. IEEE Transactions on Parallel and Distributed Systems,2005,16( 1 ) :67-79.
  • 9Conway P, Kalyanasundharam N, Donley G, eta hierarchy and memory subsyslem of the AMD proeessor[J]. IEEE Micro, 2010, 30(2): 16-23.
  • 10Gupta A, Weber W D, Mowry T. Reducing memory and traffic requirements for scalable directory based cache coherence schemes [C] //Proe of the 19th Int Conf on Parallel Processing. New York: ACM, 1990:312-321.

共引文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部