多核环境下高效集合通信关键技术研究被引量：6

Research of Collectives Optimization on Modern Multicore Clusters

下载PDF

导出

摘要随着高性能计算需求的日益增长,多核处理器在高性能计算中间得到了广泛的普及.为了保证高性能计算机系统的效率,需要保持计算和通信的平衡性,多核的广泛使用对通信系统的效率提出了更高的要求.集合通信作为通信系统中的重要组成部分,研究多核环境下的高效集合通信具有十分重要的意义.文中首先研究了多核对集合通信性能的影响,并根据多核处理器共享Cache以及内存竞争的特点,提出了层次化算法、限制并发、NUMA感知的优化方法和Cache友好的优化算法,并分别在MPI_Barrier、MPI_Bcast和MPI_Alltoall中进行了验证.实验结果表明优化方法能够有效地利用多核结构特点,降低竞争带来的影响,提高了多核环境下集合通信的性能和可扩展性. With the rapid increase in HPC computing requirement, the multicore is widely deployed in HPC systems. To keep the efficiency of application in large scale systems, it is very im- portant keep up the balance of communication to computation, thus multicore brings more re- quirement for communication systems. As collective communication is an important part in com- munication systems and is critical for the whole systems, thus it is important to research on the impacts of multicore environment on collective performance. This paper first analyzes how mul- ticore impacts on collective communication. It is found out that multicore SMP clusters brings two conflict impacts, it not only has faster intra-socket communication path which can speed up the performance of collective communications, but also it brings memory^cache contention which might degrade the communication performance. Based on these aspects, this paper proposes mul- ticore-aware collecitves optimization techniques, which includes, hierarchy-aware algorithms, limited-concurrency, NUMA-aware algorithms and cache-friendly optimization. These optimiza- tion methods are implemented in MPI_]3arrier, MPI_]3cast and MPI_Alltoall. Experiments show that the proposed algorithms increase the performance and scalabitity of collective communication.

作者张攀勇孟丹霍志刚

机构地区中国科学院计算技术研究所国家智能计算机研究开发中心中国科学院计算机系统结构重点实验室中国科学院研究生院

出处《计算机学报》 EI CSCD 北大核心 2010年第2期317-325,共9页 Chinese Journal of Computers

基金国家"八六三"高技术研究发展计划项目"曙光5000A高效能计算机"(2006AA01A102)资助~~

关键词高性能计算多核机群集合通信优化 NUMA_MPI HPC multicore clusters collectives optimization NUMA MPI

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献12

1Asanovie K, Bodik R, Catanzaro B C, Gebis J J, Husbands P, Keutzer K, Patterson D A, Plishker W L, Shall J, Williams S W, Yeliek K A. The landscape of parallel computing research: A view from berkeley. University of California at Berkeley, Berkeley: No. UCB/EECS-2006-183, 2006.
2Rabenseifner Rolf. Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512//Proceedings of the Message Passing Interace Developers and Users Conference 1999 (MPIDC). Atlanta, USA, 1999:35-42.
3Chai Lei, Gao Qi, Panda Dhabaleswar K. Understanding the impact of multieore architecture in leuster computing: A ease study with Intel dual-core system//Proeeedings of the 7th IEEE International Symposium on Cluster Computing and the Grid(CCGrid'07). Rio de Janeiro. Brazil, 2007:471-478.
4Tipparaju Vinod, Nieplocha Jarek, Panda Dhabaleswar K. Fast collective operations using shared and remote memory access protocols on clusters//Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003). Nice, France, 2003:10.
5Wu Meng-Shiou, Kendall Ricky A, Aluru Srinivas. Exploring collective communications on a cluster of SMPs//Proceedings of the 7th International Conference on High Performance Computing and Grid in Asia Pacific Region (HPCA? sia 2004). Tokyo Area, Japan, 2004:114-117.
6Marnidala Amith R, Kumar Rahul, De Debraj, Panda Dhabaleswar K. MPI collectives on modern multicore clusters: Performance optimizations and communication eharaeteristics//Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID' 08). Lyon, France, 2008:130-137.
7Kumar Rahul, Mamidala Amith R, Panda Dhabaleswar K. Scaling alltoall collective on multi-core systems//Proceedings of the IEEE International Symposium on Parallel and Distributed Processing 2008 (IPDPS 2008). Miami, USA, 2008: 1-8.
8Graham Richard L, Shipman Galen M. MPI support for multi-core architectures: Optimized shared memory collectives//Proceedings o1 the Recent Advances in Parallel Virtual Machine and Message Passing Interface. Dublin, Ireland, 2008:130-140.
9Tu Bi Bo, Fan Jian-Ping, Zhan Jian-Feng, et al. Aceurate analytical models for message passing on multi-core clusters//Proceedings of the 2009 Parallel, Distributed and Network-Based Processing (PDP 2009). Weimar, Germany, 2009:133-139.
10Cheng Liqun, Carter John B. Fast barriers for scalable cc NUMA systems//Proceedings of the International Conference on Parallel Processing 2005 (ICPP 2005). Oslo, Norway, 2005:241-250.

同被引文献47

1王佰玲,方滨兴,云晓春.零拷贝报文捕获平台的研究与实现[J].计算机学报,2005,28(1):46-52. 被引量：67
2刘洋,曹建文,李玉成.聚合通信模型的测试与分析[J].计算机工程与应用,2006,42(9):30-33. 被引量：4
3陈靖,张云泉,张林波,袁伟.一种新的MPI Allgather算法及其在万亿次机群系统上的实现与性能分析[J].计算机学报,2006,29(5):808-814. 被引量：8
4Seo Sangmin,Lee Jaejin,Sura Zehra.Design and implementation of software-managed caches for multicores with local memory//Proceedings of the High Performance Computer Architecture Conference (HPCA,09).Shanghai,China,2009:55-66.
5Miller Jason E,Agarwal Anant.Software-based instruction caching for embedded processors//Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating (ASPLOS-XII).San Jose,CA,2006:293-302.
6Moritz Csaba Andras,Frank Moritz Matthew,Lee Walter et al.Hot Pages:Software Caching for Raw Microprocessors.USA:MIT,MIT-LCS Technical Memo LCSTM-599,1999.
7Moritz Csaba Andras,Frank Matthew I,Amarasinghe Saman.FlexCache:A framework for flexible compiler generated data caching//Proceedings of the 2nd Workshop on Intelligent Memory Systems.Cambridge,MA,2001:135-146.
8Udayakumaran Sumesh,Dominguez Angel,Barua Rajeev.Dynamic allocation for scratch-pad memory using compiletime decisions.ACM Transactions on Embedded Computing Systems (TECS),2006,5(2):472-511.
9Witchel Emmett,Larsen Sam,Ananian C Scott et al.Direct addressed caches for reduced power consumption//Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO' 01).Austin,Texas,2001:124-133.
10Fryman Joshua B,Lee Hsien-Hsin S,Huneycutt Chad M.SoftCache:A technique for power and area reduction in embedded systems.USA:Georgia Institute of Technology,CERCS:GIT-CERCS-03-06,2003.

引证文献6

1曹倩,胡长军,张云星,朱于畋.一种面向非规则引用的Cell多核处理器自适应Cache行策略[J].计算机学报,2011,34(5):899-911. 被引量：1
2李超.多核环境下高效集合通信关键技术研究分析[J].科技资讯,2011,9(26):122-122.
3李强,孙凝晖,霍志刚,马捷.MPI Alltoall通信在多核机群中的优化[J].计算机研究与发展,2013,50(8):1744-1754. 被引量：2
4罗浩,陆文龙,薛晨.基于内存共享机制的容器间快速通信方法[J].华中科技大学学报（自然科学版）,2016,44(11):103-106. 被引量：8
5罗殊彦,朱怡安,曾诚.嵌入式异构多核处理器核间的通信性能评估与优化[J].计算机科学,2018,45(B06):262-265. 被引量：4
6罗红兵,张晓霞.MPI集合通信性能可扩展性研究与分析[J].计算机科学与探索,2017,11(2):252-261. 被引量：4

二级引证文献18

1李德建,陈琦,沈冲飞.电力分布式智能配电终端控制芯片实时增强技术概述[J].微纳电子与智能制造,2022,4(4):94-101.
2王亚茹,王鹏,王德志.基于MPI的多核并行模式的性能测试与分析[J].成都信息工程大学学报,2018,33(6):617-623. 被引量：4
3刘国乐,余彦峰.浅析Docker容器技术[J].保密科学技术,2017,0(10):26-30. 被引量：5
4洪佩军,吴明杰,陈庆奎.基于MPI百万级家庭网关模拟的设计与实现[J].计算机工程与设计,2017,38(9):2342-2346. 被引量：2
5罗红兵,张晓霞.MPI集合通信性能可扩展性研究与分析[J].计算机科学与探索,2017,11(2):252-261. 被引量：4
6罗红兵,张晓霞,魏勇.Alltoall通信性能模型研究[J].计算机科学与探索,2018,12(4):559-566.
7万威强,肖俊敏,洪学海,谭光明.面向大规模海洋数据同化算法的并行实现及优化[J].计算机工程与科学,2019,41(5):765-772. 被引量：3
8孟虹松,郭绍忠,许谨晨,王磊,张乾坤.基于数据表精简算法的超越函数访存优化方法[J].信息工程大学学报,2019,20(3):328-334. 被引量：1
9聂峥,章坚民,傅华渭.配变终端边缘节点化及容器化的关键技术和应用场景设计[J].电力系统自动化,2020,44(3):154-161. 被引量：50
10王海柱,郭文鑫,郑文杰,黎皓彬.配用电边缘计算终端的云边协同机制与运行策略[J].电器工业,2020(11):74-78. 被引量：11

1王洁,曾宇,张建林.多核机群下基于神经网络的MPI运行时参数优化[J].计算机科学,2010,37(6):229-232. 被引量：3
2张勇斌,陈晓怀,卢荣胜,费业泰.基于层次化算法在欧氏空间不标定重构三维场景[J].仪器仪表学报,2005,26(7):710-714. 被引量：4
3王玄,李文敬.基于多核机群的Petri网系统并行化模型的研究[J].现代计算机（中旬刊）,2016(4):12-17.
4黄华林,钟诚.多核机群上数据密集型应用并行程序性能优化[J].计算机工程与应用,2012,48(30):73-77.
5李文敬,李双,元昌安,廖伟志.基于多核机群的事务内存并行编程模型的研究[J].小型微型计算机系统,2014,35(8):1732-1737. 被引量：3
6李双,李文敬,孙环龙,林中明.基于多核机群的人工鱼群并行算法[J].计算机应用,2013,33(12):3380-3384. 被引量：3
7翁玉芬,方建滨,车永刚,王正华.All-to-all在千兆以太网集群上的性能分析[J].中国电子商情（通信市场）,2009(2):69-73. 被引量：1
8柯琦,钟诚,陈清媛,陆向艳.多核机群上通信高效的整数序列并行排序方法[J].计算机应用,2013,33(3):821-824. 被引量：2
9李强,孙凝晖,霍志刚,马捷.MPI Alltoall通信在多核机群中的优化[J].计算机研究与发展,2013,50(8):1744-1754. 被引量：2
10彭佳扬,杨路明,王建新,李敏,蔡娟.一种发现交叠社团的快速层次化算法[J].中南大学学报（自然科学版）,2010,41(5):1834-1840. 被引量：2

计算机学报

2010年第2期

浏览历史

内容加载中请稍等...

多核环境下高效集合通信关键技术研究被引量：6

参考文献12

同被引文献47

引证文献6

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

多核环境下高效集合通信关键技术研究 被引量：6

参考文献12

同被引文献47

引证文献6

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

多核环境下高效集合通信关键技术研究被引量：6