摘要
高效通信性能是影响并行文件系统性能与效率的重要因素.该文基于TH-2系统的高性能互连网络——天河2高速互连网络(TH-Express 2),设计并实现了混合层次文件系统H^2FS中的高速通信模块FSE.FSE采用了TH-Express 2的短报文通信MP和远程内存访问RDMA相结合的方式,实现客户端与ION的通信.FSE采用基于动态链表的内存注册池减少通信延迟,提高通信效率.基于可变信用的流量控制,FSE有效降低系统出现拥塞的可能性,提高系统的可扩展性和稳定性.为了充分利用多核处理器的并发处理能力以及多RDMA引擎的传输能力,FSE对报文传输和数据处理功能采用了多维多线程方式,提高数据访问带宽.FSE的自适应端点管理针对通信端点容错能力设计,提高了系统弹性,有助于增强系统稳定性.FSE实现优化了数据传输效率,实现了低延迟、高带宽、高可扩展数据访问.在两个不同实际系统上的测试结果表明,FSE可以充分发掘TH-Express 2的特性,点点数据访问带宽可达8.6Gbps,使得H^2FS具有较高的数据访问带宽和元数据性能,并具有良好的可扩展性.同基于TCP的通信模块实现相比,FSE的读写延迟可低至55%和20%,最大单ION写性能是其3.3倍;同Lustre文件系统相比,FSE更好地利用了高速网的特性来获取较高性能,写延迟可低至其28.6%.
The performance of transportation is one of the key factors which affect the performance and efficiency of parallel file system. The main target of this paper is to fulfill data transportation in Hybrid Hierarchy File System H2FS. H2FS is the parallel file system used in Tianhe 2 super- computer. Based on Tianhe 2 interconnection network TH-Express 2, this paper designed and implemented the communication module FSE in H2 FS. FSE adopted Mini Packet and Remote Direct Memory Access to transport data among computing nodes and IO nodes. It also makes some optimizations including dynamic list based memory registration pool, alterable credits based flow control, multidimensional multi-threads service and adaptive endpoint management. With dynamic list based memory registration pool, FSE reduces communication latency to improve efficiency. With alterable credits based flow control, FSE abates the possibility of congestion among different endpoints, and improves scalability and stability. In order to take advantage of the parallel processing ability of multicore processors and the ability of multi RDMA engine, FSE adopts multidimensional multi-threads to transport and process I/O data. In such way, FSE has high bandwidth. The adaptive endpoint management in FSE is mainly aim at fault tolerance. It makes FSE more resilient and stable. With the support of these optimizations, FSE fulfilled low latency, high bandwidth and scalable data access. The experiments on two different real systems show that FSE can make full use of TH-Express 2 and archive point to point bandwidth of 8.6Gbps. It brings high performance on data access and metadata with sealability. Compared with TCP-based communication module, FSE has 55% read latency and 20% write latency, and has 3.3 times write performance at most with one ION. Compared with Lustre, FSE can take advantage of TH-Express 2 to archive better performance which has write latency of 28.6 %.
出处
《计算机学报》
EI
CSCD
北大核心
2017年第9期1961-1979,共19页
Chinese Journal of Computers
基金
国家自然科学基金(61120106005)
国家"八六三"高技术研究发展计划项目基金(2012AA01A301)
国家重点研发计划(2016YFB0200400)
科技部云计算与大数据重大专项(2016YFB1000302)资助~~