MPD:结点具有多个并行缓存一致性域的CC-NUMA系统

MPD:A CC-NUMA System with Clump Having Multiple Parallel Cache Coherency Domains

下载PDF

导出

摘要大规模高速缓存一致性非均匀存储访问(cache coherence non-uniform memory access,CC-NUMA)系统通常采用两级一致性域方法来降低缓存一致性协议维护开销,提升系统性能.两级一致性域系统中,多个处理器互连,形成结点内一致性域;多个结点互连,形成结点间一致性域.然而,受限于处理器直连能力与处理器可识别ID数,系统的单结点规模有限,系统规模的扩展不得不依靠增加结点数来实现,使得大规模CC-NUMA系统的结点间互连复杂度上升,跨结点访问带宽和延迟急剧增长,影响了系统性能的有效扩展.MPD系统通过在结点内构建多个并行缓存一致性域,突破了处理器直连能力与可识别ID数对单结点规模的限制,能够大幅减少结点数量,并将部分结点间访问转化为结点内访问,实现系统性能的有效扩展.理论分析和实验结果表明:采用同规格处理器的32路系统中,结点内4个并行缓存一致性域的MPD系统可实现结点数目减少75%、一致性目录存储开销节省40%以上、平均访问延迟降低约27.9%、系统整体性能提升约14.4%. Large-scale CC-NUMA system usually employs two-tier architecture to reduce the overhead of cache coherence and enhance the performance of system.In a two-tier system,various processors and a coherence chip are located in an intra-clump cache coherency domain,and various coherence chips are interconnected by a system interconnection network so as to form an inter-clump cache coherency domain.Since every processor occupies at least one processor ID number in the cache coherency domain,and the number of processor ID numbers that can be distinguished by every processor is limited,CC-NUMA system expands the scale only by increasing the number of clumps,not by increasing the scale of clump.This leads to the over-large number of clumps and complicated topology structure in a multi-processor system,thereby increasing the bandwidth and latency of cross-clump memory access.To solve this problem,we propose a new method to construct multi-processor system,called MPD,in which a clump has multiple parallel cache coherency domains.This method solves the problem of limited clump scale brought about by limited number of processor supportable by a processor in a domain.Compared with traditional CC-NUMA system,MPD system not only significantly reduces the system topological complexity,but also effectively improves the system performance.Theoretical analysis and simulation results show:compared with32-way CC-NUMA system,MPD system constructed by same processors can achieve75%reduction in the number of nodes,more than40%savings in consistency directory storage,27.9%average reduction in access latency and about14.4%improvement in system performance.

作者陈继承赵雅倩李一韩王恩东史宏志唐士斌 Chen Jicheng;Zhao Yaqian;Li Yihan;Wang Endong;Shi Hongzhi;Tang Shibin(State Key Laboratory of High-End Server&Storage Technology (Inspur Group Company himited ) , Beijing 100085)

机构地区高效能服务器和存储技术国家重点实验室(浪潮集团有限公司)

出处《计算机研究与发展》 EI CSCD 北大核心 2017年第4期775-786,共12页 Journal of Computer Research and Development

基金国家"八六三"高技术研究发展计划基金项目(2013AA011701)~~

关键词 CC-NUMA系统两级一致性域并行缓存一致性域一致性协同芯片系统可扩展性 CC-NUMA (cache coherence non-uniform memory access) system two-tier architecture multiple parallel cache coherency domain (MPD) coherence chip (CC) system scalability

分类号 TP303 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1张骏,田泽,梅魁志,赵季中.基于节点预测的直接Cache一致性协议[J].计算机学报,2014,37(3):700-720. 被引量：33
2张轮凯,宋风龙,王达,范东睿,孙凝晖.提升稀疏目录缓存一致性系统性能的方法[J].计算机研究与发展,2014,51(9):1955-1970. 被引量：3
3王恩东,陈继承,胡雷钧,公维峰.基于紧耦合单跳步多平面架构的高端服务器设计[J].高技术通讯,2014,24(2):111-116. 被引量：5

二级参考文献44

1Laudon J, Lenoski D. The SGI Origin : a ccNUMA highly scalable server. In: Proceedings of the ACM 24th Annual International Symposium on Computer Architecture, New York, USA, 1997. 241-251.
2Gostin G, Collard J-F, Collins K. The architecture of HP superdome shared-memory muhiprocessor. In : Proceedings of the ACM 19th Annual International Conference on Su- percomputing, New York, USA ,2005. 239-245.
3Aono F, Kimura M. The AZUSA 16-way Itanium server. IEEE Micro ,2000,20(5 ) :54-60.
4Gharachorlooy K, Sharma M, Steely S, et al. Architecture and design of AlphaServer GS320. ACM Sigplan Notices, 2000,35( 11 ) :13-24.
5Conway P, Hughes B. The AMD Opteron northbridge ar- chitecture. IEEE Micro ,2007,27 (2) : 10-21.
6Feehrer J, Rotker P, Shih M, et al. Coherency hub design for muhisoeket SUN servers with coohhreads technology. IEEE Micro ,2009,29(4 ) :36-47.
7Kota R, Oehler R. HORUS : Large-scale symmetric muhi- processing for Opteron system. IEEE Micro ,2005,25 ( 2 ) : 30-40.
8Aeacio M. E, Gonzalez J, Garcia J. M, et al. A two-level directory architecture for highly sealable ce-NUMA muhi- processors. IEEE Transactions on Parallel and Distributed Systems,2005,16( 1 ) :67-79.
9Conway P, Kalyanasundharam N, Donley G, eta hierarchy and memory subsyslem of the AMD proeessor[J]. IEEE Micro, 2010, 30(2): 16-23.
10Gupta A, Weber W D, Mowry T. Reducing memory and traffic requirements for scalable directory based cache coherence schemes [C] //Proe of the 19th Int Conf on Parallel Processing. New York: ACM, 1990:312-321.

共引文献38

1李小红.REST架构下作业线模糊贴近度支配集优化控制[J].科技通报,2014,30(12):205-207. 被引量：1
2董俊.MAC层信息平台解析引擎缓存数据预取算法[J].科技通报,2015,31(2):49-51.
3刘荷花.受经验约束的Web故障监测数据自适应重写算法[J].科技通报,2015,31(2):104-106.
4郑晓霞,聂阳,戈华.伪随机时频跳变网络谐振信号的信息容量估计[J].科技通报,2015,31(2):188-190.
5陆兴华,陈平华.基于定量递归联合熵特征重构的缓冲区流量预测算法[J].计算机科学,2015,42(4):68-71. 被引量：86
6高亮,金秋.网络威胁态势预测的抗体虚警概率阈值估计[J].科技通报,2015,31(6):64-66.
7吴丰,吕振雷,陈珂锐.基于时间尺度分析的云网格空间资源调度算法[J].计算机仿真,2015,32(8):131-135. 被引量：8
8马蕾,龚戈淬,刘建平.海量数据存储机制的研究——以海量金属数据为例[J].世界有色金属,2015,40(9):72-73.
9杨秀荣.并行数据库中异常数据优化分类挖掘方法研究[J].微电子学与计算机,2015,32(10):125-128. 被引量：4
10陈得友,茹金平.云计算中多源信息资源平台兼容性路由算法[J].科技通报,2015,31(10):175-177.

1庞立会,陈渝.一种CC-NUMA系统模拟环境的研究与实现[J].计算机工程,2005,31(3):82-85.
2张学东.Ajax技术在Web应用系统开发中的应用[J].华南金融电脑,2006,14(12):100-101. 被引量：4
3鲍庆元,李孟春,王焕东,曾露,王启银,赵锐.实现系统规模化的龙芯3号桥片设计与验证[J].计算机工程与应用,2014,50(9):56-60.
4赵飒飒.一卡通系统的设计与研究[J].硅谷,2009,2(1). 被引量：2
5陈锦洪.关于软件工程体系结构研究的探讨[J].广西师范学院学报（哲学社会科学版）,2010,31(S2):126-127.
6王雨.可编程控制器在模糊控制中的应用分析[J].技术与市场,2010,17(12):8-9.
7寻大勇,罗敬.软件工程中的体系结构设计[J].湖南工程学院学报（自然科学版）,2004,14(4):63-64.
8卞琛,于炯,英昌甜,修位蓉.并行计算框架Spark的自适应缓存管理策略[J].电子学报,2017,45(2):278-284. 被引量：17
9陈继承,李一韩,赵雅倩,王恩东,史宏志,唐士斌.一种基于共享转发态的多级缓存一致性协议[J].计算机研究与发展,2017,54(4):764-774. 被引量：3
10《网管员世界》更名为《网络运维与管理》[J].网管员世界,2012(16):7-7.

计算机研究与发展

2017年第4期

浏览历史

内容加载中请稍等...

MPD:结点具有多个并行缓存一致性域的CC-NUMA系统

参考文献3

二级参考文献44

共引文献38

相关作者

相关机构

相关主题

浏览历史