期刊文献+

一种基于RDMA多播机制的分布式持久性内存文件系统 被引量:4

A Distributed Persistent Memory File System Based on RDMA Multicast
下载PDF
导出
摘要 持久性内存技术与远程直接内存访问(remote direct memory access,RDMA)技术的发展,为高效分布式系统的设计提供了新的思路.然而,现有的基于RDMA的分布式系统没有充分利用RDMA的多播能力,难以解决1对多传输场景下的多拷贝文件数据传输问题,严重影响了系统性能.针对此问题,提出一种基于RDMA多播机制的分布式持久性内存文件系统(RDMA multicast transmission based distributed persistent memory file system,MTFS),通过低延迟多播通信机制充分利用RDMA多播能力,将数据高效传输到多个数据节点,从而避免了多拷贝传输操作带来的高延迟.为提升传输操作灵活性,MTFS设计了多模式多播远程过程调用(remote procedure call,RPC)机制,实现了RPC请求自适应识别,并通过优化返回机制将部分传输操作移出关键路径,进一步提升传输效率.同时MTFS提供了轻量级一致性保障机制,通过设计故障恢复功能、数据校验系统、重传策略与窗口机制,当节点出现崩溃时进行快速恢复,并在传输出现错误时实现数据精准检测与纠正,保证了数据的可靠性和一致性.实验证明,MTFS在各测试集上相比现有系统GlusterFS吞吐量提升了10.2~219倍.在Redis数据库的工作负载下,MTFS相比于NOVA取得了最高10.7%的性能提升,并在多线程测试中取得了良好的可扩展性. The development of persistent memory and remote direct memory access(RDMA)provides new opportunities for designing efficient distributed systems.However,the existing RDMA-based distributed systems are far from fully exploiting RDMA multicast capabilities,which makes them difficult to solve the problem of multi-copy file data transmission in one-to-many transmission,degrading system performance.In this paper,a distributed persistent memory and RDMA multicast transmission based file system(MTFS)is proposed.It efficiently transmits data to different data nodes by the low-latency multicast transmission mechanism,which makes full use of the RDMA multicast capability,hence avoiding high latency due to multi-copy file data transmission operations.To improve the flexibility of transmission operations,a multi-mode multicast remote procedure call(RPC)mechanism is proposed,which enables the adaptive recognition of RPC requests,and moves transmission operations out of the critical path to further improve transmission efficiency.MTFS also provides a lightweight consistency guarantee mechanism.By designing a crash recovery mechanism,a data verification module and a retransmission scheme,MTFS is able to quickly recover from a crash,and achieves file system reliability and data consistency by error detection and data correction.Experimental results show that MTFS has greatly increased the throughput by 10.2-219 times compared with GlusterFS.MTFS outperforms NOVA by 10.7% on the Redis workload,and achieves good scalability in multi-thread workloads.
作者 陈茂棠 郑圣安 游理通 王晶钰 闫田 屠要峰 韩银俊 黄林鹏 Chen Maotang;Zheng Sheng'an;You Litong;Wang Jingyu;Yan Tian;Tu Yaofeng;Han Yinjun;Huang Linpeng(Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240;Department of Computer Science and Technology,Tsinghua University,Beijing 100084;ZTE Corporation,Nanjing 210012)
出处 《计算机研究与发展》 EI CSCD 北大核心 2021年第2期384-396,共13页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018YFB1003302) 上海交通大学-华为联合实验室项目(FA2018091021-202004)。
关键词 持久性内存 远程直接内存访问 多播 分布式文件系统 远程过程调用 persistent memory remote direct memory access multicast distributed file system remote procedure call
  • 相关文献

参考文献2

二级参考文献91

  • 1Nahas J, Andre T, Subramanian C, et al. A 4Mb 0.18m 1T1MTJ toggle MRAM memory [C]//Proc of IEEE Int Conf on Solid-State Circuits ( ISSCC 2004 ). Piscataway, NJ : IEEE, 2004: 44-512.
  • 2Lee B C, Ipek E, Mutlu O, et al. Architecting phase change memory as a scalable dram alternative[J]. ACM SIGARCH Computer Architecture News, 2009, 37(3): 2-13.
  • 3Bedesehi F, Resta C, Khouri O, et al. An 8Mb demonstrator for high-density 1.8 V phase-change memories [C]//Proc of IEEE Symp on VLSI Circuits, Digest of Technical Papers. Piseataway, NJ: IEEE, 2004:442-445.
  • 4Burr G W, Kurdi B N, Scott J C, et al. Overview of candidate device technologies for storage class memory [J]. IBM Journal of Research and Development, 2008, 52(4): 449-464.
  • 5Qureshi M K, Gurumurthi S, Rajendran B. Phase change memory: From devices to systems [J]. Synthesis Lectures on Computer Architecture, 2011, 6(4): 1-134.
  • 6Qureshi M K, Srinivasan V, Rivers J A. Scalable high performance main memory system using phase-change memory technology [J]. ACM SIGARCH Computer Architecture News, 2009, 37(3): 24-33.
  • 7Chen S, Gibbons P B, Nath S. Rethinking database algorithms for phase change memory [C/OL] //Proc of the 5th Biennial Conf on Innovative Data Systems Research (CIDR 2011). 2011: 21-31. [2013-03-10]. http://www. cidrdb. org/cidr2011/Papers/CIDR11_Paper3. pdf.
  • 8Lee B C, Ipek E, Mutlu O, et al. Architecting phase change memory as a scalabie dram alternative[J]. ACM SIGARCH Computer Architecture News, 2009, 37(3): 2-13.
  • 9Bheda R A. Energy efficient Phase Change Memory based main memory for future high performance systems [C] //Proc of IEEE on Int Green Computing Conf and Workshops (IGCC 2011). Piseataway, NJ: IEEE, 2011:1-8.
  • 10Lee B C, Ipek E, Mutlu O, et al. Phase change memory architecture and the quest for scalability [J]. Communications of the ACM, 2010, 53(7): 99-106.

共引文献44

同被引文献36

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部