Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. Write- optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used ...Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. Write- optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and RocksDB. Conventional LSM-tree organizes KV items into multiple, successively larger components, and uses compaction to push KV items from one smaller component to another adjacent larger component until the KV items reach the largest component. Unfortunately, current compaction scheme incurs significant write amplification due to repeated KV item reads and writes, and then results in poor throughput. We propose a new compaction scheme, delayed compaction (dCompaction) that decreases write amplification, dCompaction postpones some compactions and gathers them into the following compaction. In this way, it avoids KV item reads and writes during compaction, and consequently improves the throughput of LSM-tree based KV stores. We implement dCompaction on RocksDB, and conduct extensive experiments. Validation using YCSB framework shows that compared with RocksDB, dCompaction has about 40% write performance improvements and also comparable read performance.展开更多
具有高性能以及非易失特性的SCM(Storage Class Memory,存储级内存)技术逐渐成熟并开始运用到存储系统设计中,而传统的SSD仍然在存储容量上具有优势,为键值存储系统提供大容量存储的支持。现有键值存储系统不能充分发挥SCM与SSD混合存...具有高性能以及非易失特性的SCM(Storage Class Memory,存储级内存)技术逐渐成熟并开始运用到存储系统设计中,而传统的SSD仍然在存储容量上具有优势,为键值存储系统提供大容量存储的支持。现有键值存储系统不能充分发挥SCM与SSD混合存储架构的优势,需要对数据布局以及系统结构进行重新设计。针对SCM和SSD的特点,设计了基于SCM与SSD的混合式高效键值存储系统(SCM and SSD Hybrid Key-Valuestore,SSHKV)。SSHKV通过将键值存储中元数据信息存储到SCM中,将数据部分以日志方式存储到SSD中,实现性能与容量的兼顾。在SSD空间管理上,SSHKV采用逻辑空间放大策略,通过重映射TRIM指令释放的无效空间,减小了垃圾回收带来的数据迁移开销。SSHKV基于半异步半同步式IO模型实现,经过对比测试,SSHKV的随机写入性能相较于传统基于LSM-Tree的LevelDB提升了约20倍。展开更多
Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,on...Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,online analytics,and real-time communication.An LSM tree stores new KV data in the memory and flushes to disk in batches.To prevent data loss in memory if there is an unexpected crash,RocksDB appends updating data in the write-ahead log(WAL)before updating the memory.However,synchronous WAL significantly reduces writing performance.In this paper,we present a new WAL mechanism named MyWAL.It directly manages raw devices(or partitions)instead of saving data on a traditional file system.These can avoid useless metadata updating and write data sequentially on disks.Experimental results show that MyWAL can significantly improve the data writing performance of RocksDB compared to the traditional WAL for small KV data on solid-state disks(SSDs),as much as five to eight times faster.On non-volatile memory express soild-state drives(NVMe SSDs)and non-volatile memory(NVM),MyWAL can improve data writing performance by 10%–30%.Furthermore,the results of YCSB(Yahoo!Cloud Serving Benchmark)show that the latency decreased by 50%compared with SpanDB.展开更多
基金This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000202 and the National Natural Science Foundation of China under Grant Nos. 61303056 and 61379042.
文摘Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. Write- optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and RocksDB. Conventional LSM-tree organizes KV items into multiple, successively larger components, and uses compaction to push KV items from one smaller component to another adjacent larger component until the KV items reach the largest component. Unfortunately, current compaction scheme incurs significant write amplification due to repeated KV item reads and writes, and then results in poor throughput. We propose a new compaction scheme, delayed compaction (dCompaction) that decreases write amplification, dCompaction postpones some compactions and gathers them into the following compaction. In this way, it avoids KV item reads and writes during compaction, and consequently improves the throughput of LSM-tree based KV stores. We implement dCompaction on RocksDB, and conduct extensive experiments. Validation using YCSB framework shows that compared with RocksDB, dCompaction has about 40% write performance improvements and also comparable read performance.
文摘具有高性能以及非易失特性的SCM(Storage Class Memory,存储级内存)技术逐渐成熟并开始运用到存储系统设计中,而传统的SSD仍然在存储容量上具有优势,为键值存储系统提供大容量存储的支持。现有键值存储系统不能充分发挥SCM与SSD混合存储架构的优势,需要对数据布局以及系统结构进行重新设计。针对SCM和SSD的特点,设计了基于SCM与SSD的混合式高效键值存储系统(SCM and SSD Hybrid Key-Valuestore,SSHKV)。SSHKV通过将键值存储中元数据信息存储到SCM中,将数据部分以日志方式存储到SSD中,实现性能与容量的兼顾。在SSD空间管理上,SSHKV采用逻辑空间放大策略,通过重映射TRIM指令释放的无效空间,减小了垃圾回收带来的数据迁移开销。SSHKV基于半异步半同步式IO模型实现,经过对比测试,SSHKV的随机写入性能相较于传统基于LSM-Tree的LevelDB提升了约20倍。
基金Project supported by the National Key Research and Development Project of China(No.2022YFB2702101)the Shaanxi Province Key Industrial Projects,China(Nos.2021ZDLGY03-02 and 2021ZDLGY03-08)the National Natural Science Foundation of China(No.92152301)。
文摘Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,online analytics,and real-time communication.An LSM tree stores new KV data in the memory and flushes to disk in batches.To prevent data loss in memory if there is an unexpected crash,RocksDB appends updating data in the write-ahead log(WAL)before updating the memory.However,synchronous WAL significantly reduces writing performance.In this paper,we present a new WAL mechanism named MyWAL.It directly manages raw devices(or partitions)instead of saving data on a traditional file system.These can avoid useless metadata updating and write data sequentially on disks.Experimental results show that MyWAL can significantly improve the data writing performance of RocksDB compared to the traditional WAL for small KV data on solid-state disks(SSDs),as much as five to eight times faster.On non-volatile memory express soild-state drives(NVMe SSDs)and non-volatile memory(NVM),MyWAL can improve data writing performance by 10%–30%.Furthermore,the results of YCSB(Yahoo!Cloud Serving Benchmark)show that the latency decreased by 50%compared with SpanDB.