期刊文献+

基于用户级融合I/O的Key-Value存储系统优化技术研究 被引量:6

Optimization of the Key-Value Storage System Based on Fused User-Level I/O
下载PDF
导出
摘要 传统分布式键值存储系统大都基于操作系统提供的套接字与可移植操作系统接口构建,受限于接口语义及内核开销,难以发挥底层新型网络和存储硬件高吞吐与低延迟的性能优势.聚焦键值存储系统的数据通路,面向高速以太网与NVMe(non-volatile memory express)固态存储,于用户态整合网络栈与I O栈,协同设计以优化吞吐性能与延迟稳定性.用户级融合I O栈的控制平面由同一处理器核心于同一上下文中统一管理网卡与固态存储设备的硬件队列,消除了传统分离式设计所导致的多次进出内核态、多次上下文切换以及潜在的核间通信与数据迁移等的弊端,最大限度降低系统软件层面的管控开销.数据平面采用统一的内存池,借助用户级设备驱动,数据于上层键值系统与底层设备之间直接通过DMA传输,没有额外数据拷贝与操作系统干涉.针对大消息访问请求,通过将数据分片并交叠执行网络与存储DMA操作,进一步掩藏了访问延迟.实现了全用户态键值存储系统UKV,支持内存外存2层存储以及广泛应用的Memcache接口.将UKV与由Twitter开源的Fatcache系统进行了测试对比.实验结果表明,涉及外存的SET请求的每秒查询吞吐量提高了14.97%~97.78%,GET操作的每秒查询吞吐量提高了14.60%~51.81%;涉及外存的SET操作的p95延迟降低了26.12%~40.90%,GET操作的p95延迟降低了15.10%~24.36%. The traditional distributed key-value storage systems are commonly designed around the conventional Socket and POSIX I O interfaces.Limited by the interface semantics and OS kernel overhead,it is difficult for such key-value systems to achieve high efficiency on modern high-performance network and storage hardware.In this paper,we propose a fused user-level I O approach to improve the throughput performance and latency consistency for key-value systems based on high-speed Ethernet and NVMe SSDs.The control plane of the proposed I O stack utilizes one single processor core and one single context to cooperatively manage the hardware queues of both the NIC and the SSD devices.The overheads of kernel mode entering,interrupts and context switches and inter-core communications are eliminated.The data plane is driven by a unified memory pool for fused I O access,and the data is directly transferred between the key-value system and the device hardware without extra data copies.For requests with large-size payload,data is sliced and fed into different DMA stages and the latency is further hidden through pipelining and overlapping.We present UKV,an all-in-userland key-value system with support of a two-level DRAM-SSD storage hierarchy and the widely-used Memcache interface.The experimental results indicate that,compared with Fatcache,the QPS of SSD-involved SET requests is increased by 14.97%~97.78%,and the QPS of the GET operation is increased by 14.60%~51.81%.The p95 latency of SSD-involved SET requests is reduced by 26.12%~40.90%,and the p95 latency of GET operations is reduced by 15.10%~24.36%.
作者 安仲奇 张云尧 邢晶 霍志刚 An Zhongqi;Zhang Yunyao;Xing Jing;Huo Zhigang(State Key Laboratory of Computer Architecture(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100049)
出处 《计算机研究与发展》 EI CSCD 北大核心 2020年第3期649-659,共11页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018YFC0809300) 国家自然科学基金青年科学基金项目(61502454)~~
关键词 键值存储系统 旁路内核 用户级融合I O 高速以太网 NVMe固态硬盘 key-value storage system kernel-bypass user-space fused I O high-speed Ethernet NVMe SSD
  • 相关文献

参考文献1

二级参考文献20

  • 1Abdelrahman T S, Liu O. Overlap of computation and com- munication on shared-memory networks-of-workstations//Proceedings of the Cluster Computing. California, USA, 2001:35-45.
  • 2Calland P-Y, Dongarra J, Robert Y. Tiling on systems with communication/computation overlap. Concurrency Practice and Experience, 1999, 11(3): 139-153.
  • 3Culler D, Karp R, Patterson D, Sahay A, Schauser K E, Santos E, Subramonian R, yon Eicken T. LogP: Towards a realistic model of parallel computation//Proceedings of the Principles Practice of Parallel Programming. San Diego, Canada, 1993: 1-12.
  • 4Hoefler T, Lumsdaine A, Rehm W. Implementation and performance analysis of non-blocking collective operations for MPI//Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07. Reno, USA, 2007: 52-61.
  • 5Arkady Kanevsky, Anthony Skjellum, Anna Rounbehler. MPI/RT- an emerging standard for high-performance real- time systems//Proceedings of the HICSS. Hawaii, USA, 1998:157-166.
  • 6Hoefler T, Gottschling P, Lumsdaine A, Rehm W. Optimi zing a conjugate gradient solver with non-blocking collective op erations. Elsevier Journal of Parallel Computing (PARCO) 2007, 33(9): 624-633.
  • 7Hoefler T, Kambadur P, Graham R L, Shipman G, Lums daine A. A case for standard non-blocking collective opera tions//Proceedings of the PVM/MPI. Paris, France, 2007 125-134.
  • 8Gropp William, Lusk Ewing, Skiellum Anthony. Using MPI: Portable Parallel Programming with the Message-Pass- ing Interface. Cambridge, MA, USA: MIT Press Scientificand Engineering Computation Series, 1995.
  • 9Gropp William, Lusk Ewing, Skjellum Anthony. Using MPI-2 : Advanced Features of the Message Passing Interface. Cambridge, MA, USA: MIT Press Scientific and Engineering Computation Series, 1999.
  • 10Keleher P, Cox A, Swarkadas S, Zwaenepoel W. Tread- Marks: Distributed shared memory on standard workstations and operating systems//Proceedings of the 1994 Winter USENIX Conference. San Francisco, USA, 1994:115-132.

共引文献2

同被引文献59

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部