期刊文献+

多核环境下高效集合通信关键技术研究 被引量:6

Research of Collectives Optimization on Modern Multicore Clusters
下载PDF
导出
摘要 随着高性能计算需求的日益增长,多核处理器在高性能计算中间得到了广泛的普及.为了保证高性能计算机系统的效率,需要保持计算和通信的平衡性,多核的广泛使用对通信系统的效率提出了更高的要求.集合通信作为通信系统中的重要组成部分,研究多核环境下的高效集合通信具有十分重要的意义.文中首先研究了多核对集合通信性能的影响,并根据多核处理器共享Cache以及内存竞争的特点,提出了层次化算法、限制并发、NUMA感知的优化方法和Cache友好的优化算法,并分别在MPI_Barrier、MPI_Bcast和MPI_Alltoall中进行了验证.实验结果表明优化方法能够有效地利用多核结构特点,降低竞争带来的影响,提高了多核环境下集合通信的性能和可扩展性. With the rapid increase in HPC computing requirement, the multicore is widely deployed in HPC systems. To keep the efficiency of application in large scale systems, it is very im- portant keep up the balance of communication to computation, thus multicore brings more re- quirement for communication systems. As collective communication is an important part in com- munication systems and is critical for the whole systems, thus it is important to research on the impacts of multicore environment on collective performance. This paper first analyzes how mul- ticore impacts on collective communication. It is found out that multicore SMP clusters brings two conflict impacts, it not only has faster intra-socket communication path which can speed up the performance of collective communications, but also it brings memory^cache contention which might degrade the communication performance. Based on these aspects, this paper proposes mul- ticore-aware collecitves optimization techniques, which includes, hierarchy-aware algorithms, limited-concurrency, NUMA-aware algorithms and cache-friendly optimization. These optimiza- tion methods are implemented in MPI_]3arrier, MPI_]3cast and MPI_Alltoall. Experiments show that the proposed algorithms increase the performance and scalabitity of collective communication.
出处 《计算机学报》 EI CSCD 北大核心 2010年第2期317-325,共9页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展计划项目"曙光5000A高效能计算机"(2006AA01A102)资助~~
关键词 高性能计算 多核机群 集合通信优化 NUMA_MPI HPC multicore clusters collectives optimization NUMA MPI
  • 相关文献

参考文献12

  • 1Asanovie K, Bodik R, Catanzaro B C, Gebis J J, Husbands P, Keutzer K, Patterson D A, Plishker W L, Shall J, Williams S W, Yeliek K A. The landscape of parallel computing research: A view from berkeley. University of California at Berkeley, Berkeley: No. UCB/EECS-2006-183, 2006.
  • 2Rabenseifner Rolf. Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512//Proceedings of the Message Passing Interace Developers and Users Conference 1999 (MPIDC). Atlanta, USA, 1999:35-42.
  • 3Chai Lei, Gao Qi, Panda Dhabaleswar K. Understanding the impact of multieore architecture in leuster computing: A ease study with Intel dual-core system//Proeeedings of the 7th IEEE International Symposium on Cluster Computing and the Grid(CCGrid'07). Rio de Janeiro. Brazil, 2007:471-478.
  • 4Tipparaju Vinod, Nieplocha Jarek, Panda Dhabaleswar K. Fast collective operations using shared and remote memory access protocols on clusters//Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003). Nice, France, 2003:10.
  • 5Wu Meng-Shiou, Kendall Ricky A, Aluru Srinivas. Exploring collective communications on a cluster of SMPs//Proceedings of the 7th International Conference on High Performance Computing and Grid in Asia Pacific Region (HPCA? sia 2004). Tokyo Area, Japan, 2004:114-117.
  • 6Marnidala Amith R, Kumar Rahul, De Debraj, Panda Dhabaleswar K. MPI collectives on modern multicore clusters: Performance optimizations and communication eharaeteristics//Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID' 08). Lyon, France, 2008:130-137.
  • 7Kumar Rahul, Mamidala Amith R, Panda Dhabaleswar K. Scaling alltoall collective on multi-core systems//Proceedings of the IEEE International Symposium on Parallel and Distributed Processing 2008 (IPDPS 2008). Miami, USA, 2008: 1-8.
  • 8Graham Richard L, Shipman Galen M. MPI support for multi-core architectures: Optimized shared memory collectives//Proceedings o1 the Recent Advances in Parallel Virtual Machine and Message Passing Interface. Dublin, Ireland, 2008:130-140.
  • 9Tu Bi Bo, Fan Jian-Ping, Zhan Jian-Feng, et al. Aceurate analytical models for message passing on multi-core clusters//Proceedings of the 2009 Parallel, Distributed and Network-Based Processing (PDP 2009). Weimar, Germany, 2009:133-139.
  • 10Cheng Liqun, Carter John B. Fast barriers for scalable cc NUMA systems//Proceedings of the International Conference on Parallel Processing 2005 (ICPP 2005). Oslo, Norway, 2005:241-250.

同被引文献47

  • 1王佰玲,方滨兴,云晓春.零拷贝报文捕获平台的研究与实现[J].计算机学报,2005,28(1):46-52. 被引量:67
  • 2刘洋,曹建文,李玉成.聚合通信模型的测试与分析[J].计算机工程与应用,2006,42(9):30-33. 被引量:4
  • 3陈靖,张云泉,张林波,袁伟.一种新的MPI Allgather算法及其在万亿次机群系统上的实现与性能分析[J].计算机学报,2006,29(5):808-814. 被引量:8
  • 4Seo Sangmin,Lee Jaejin,Sura Zehra.Design and implementation of software-managed caches for multicores with local memory//Proceedings of the High Performance Computer Architecture Conference (HPCA,09).Shanghai,China,2009:55-66.
  • 5Miller Jason E,Agarwal Anant.Software-based instruction caching for embedded processors//Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating (ASPLOS-XII).San Jose,CA,2006:293-302.
  • 6Moritz Csaba Andras,Frank Moritz Matthew,Lee Walter et al.Hot Pages:Software Caching for Raw Microprocessors.USA:MIT,MIT-LCS Technical Memo LCSTM-599,1999.
  • 7Moritz Csaba Andras,Frank Matthew I,Amarasinghe Saman.FlexCache:A framework for flexible compiler generated data caching//Proceedings of the 2nd Workshop on Intelligent Memory Systems.Cambridge,MA,2001:135-146.
  • 8Udayakumaran Sumesh,Dominguez Angel,Barua Rajeev.Dynamic allocation for scratch-pad memory using compiletime decisions.ACM Transactions on Embedded Computing Systems (TECS),2006,5(2):472-511.
  • 9Witchel Emmett,Larsen Sam,Ananian C Scott et al.Direct addressed caches for reduced power consumption//Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO' 01).Austin,Texas,2001:124-133.
  • 10Fryman Joshua B,Lee Hsien-Hsin S,Huneycutt Chad M.SoftCache:A technique for power and area reduction in embedded systems.USA:Georgia Institute of Technology,CERCS:GIT-CERCS-03-06,2003.

引证文献6

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部