期刊文献+

基于天河2高速互连网络实现混合层次文件系统H^2FS高速通信 被引量:7

The Implementation of Communicating Operation in Hybrid Hierarchy File System H^2FS with TH-Express 2
下载PDF
导出
摘要 高效通信性能是影响并行文件系统性能与效率的重要因素.该文基于TH-2系统的高性能互连网络——天河2高速互连网络(TH-Express 2),设计并实现了混合层次文件系统H^2FS中的高速通信模块FSE.FSE采用了TH-Express 2的短报文通信MP和远程内存访问RDMA相结合的方式,实现客户端与ION的通信.FSE采用基于动态链表的内存注册池减少通信延迟,提高通信效率.基于可变信用的流量控制,FSE有效降低系统出现拥塞的可能性,提高系统的可扩展性和稳定性.为了充分利用多核处理器的并发处理能力以及多RDMA引擎的传输能力,FSE对报文传输和数据处理功能采用了多维多线程方式,提高数据访问带宽.FSE的自适应端点管理针对通信端点容错能力设计,提高了系统弹性,有助于增强系统稳定性.FSE实现优化了数据传输效率,实现了低延迟、高带宽、高可扩展数据访问.在两个不同实际系统上的测试结果表明,FSE可以充分发掘TH-Express 2的特性,点点数据访问带宽可达8.6Gbps,使得H^2FS具有较高的数据访问带宽和元数据性能,并具有良好的可扩展性.同基于TCP的通信模块实现相比,FSE的读写延迟可低至55%和20%,最大单ION写性能是其3.3倍;同Lustre文件系统相比,FSE更好地利用了高速网的特性来获取较高性能,写延迟可低至其28.6%. The performance of transportation is one of the key factors which affect the performance and efficiency of parallel file system. The main target of this paper is to fulfill data transportation in Hybrid Hierarchy File System H2FS. H2FS is the parallel file system used in Tianhe 2 super- computer. Based on Tianhe 2 interconnection network TH-Express 2, this paper designed and implemented the communication module FSE in H2 FS. FSE adopted Mini Packet and Remote Direct Memory Access to transport data among computing nodes and IO nodes. It also makes some optimizations including dynamic list based memory registration pool, alterable credits based flow control, multidimensional multi-threads service and adaptive endpoint management. With dynamic list based memory registration pool, FSE reduces communication latency to improve efficiency. With alterable credits based flow control, FSE abates the possibility of congestion among different endpoints, and improves scalability and stability. In order to take advantage of the parallel processing ability of multicore processors and the ability of multi RDMA engine, FSE adopts multidimensional multi-threads to transport and process I/O data. In such way, FSE has high bandwidth. The adaptive endpoint management in FSE is mainly aim at fault tolerance. It makes FSE more resilient and stable. With the support of these optimizations, FSE fulfilled low latency, high bandwidth and scalable data access. The experiments on two different real systems show that FSE can make full use of TH-Express 2 and archive point to point bandwidth of 8.6Gbps. It brings high performance on data access and metadata with sealability. Compared with TCP-based communication module, FSE has 55% read latency and 20% write latency, and has 3.3 times write performance at most with one ION. Compared with Lustre, FSE can take advantage of TH-Express 2 to archive better performance which has write latency of 28.6 %.
出处 《计算机学报》 EI CSCD 北大核心 2017年第9期1961-1979,共19页 Chinese Journal of Computers
基金 国家自然科学基金(61120106005) 国家"八六三"高技术研究发展计划项目基金(2012AA01A301) 国家重点研发计划(2016YFB0200400) 科技部云计算与大数据重大专项(2016YFB1000302)资助~~
关键词 混合层次文件系统 H^2FS FSE 天河2高速互连网络 hybrid hierarchy file system H2FS FSE TH-Express 2
  • 相关文献

参考文献4

二级参考文献85

  • 1周恩强,卢宇彤,沈志宇.一个适合大规模集群并行计算的检查点系统[J].计算机研究与发展,2005,42(6):987-992. 被引量:12
  • 2E.N. Elnozahy, D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. School of Computer Science, Carnegie Mellon University, Tech Rep: CMU-CS-96-181, 1996
  • 3Pierre Lemarinier, Aurelien Bouteiller. Improved message logging versus improved coordinated checkpointing for fault tolerant MPI.IEEE Int'l Conf. Cluster Computing (Cluster 2003), Hong Kong, 2003
  • 4Chandy K M, Lamport L. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Computer Systems, 1985, 3(1): 63~75
  • 5谢旻 邢座程.NICHAL通信软件接口设计与实现[J].计算机研究与发展,2002,39:189-203.
  • 6Top500, http://www.top500.org, 2013.
  • 7Liao K X, Xiao Q L, Yang Q C, Lu T Y. MilkyWay-2 supercomputer system and application. Submitted to Frontiers of Computer Science, 2013.
  • 8Pritchard H, Gorodetsky I, Buntinas D. A ugni-based mpich2 nemesis network module for the cray xe. In: Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface. 2011, 110--119.
  • 9Xie M, Lu Y, Liu L, Cao H, Yang X. Implementation and evaluation of network interface and message passing services for Tianhe-la supercomputer. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011,78-86.
  • 10Chun B N, Mainwaring A, Culler D E. Virtual network transport protocols for myrinet. IEEE Micro, 1998, 18(1): 53--63.

共引文献37

同被引文献52

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部