Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a mem...Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.展开更多
Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,on...Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,online analytics,and real-time communication.An LSM tree stores new KV data in the memory and flushes to disk in batches.To prevent data loss in memory if there is an unexpected crash,RocksDB appends updating data in the write-ahead log(WAL)before updating the memory.However,synchronous WAL significantly reduces writing performance.In this paper,we present a new WAL mechanism named MyWAL.It directly manages raw devices(or partitions)instead of saving data on a traditional file system.These can avoid useless metadata updating and write data sequentially on disks.Experimental results show that MyWAL can significantly improve the data writing performance of RocksDB compared to the traditional WAL for small KV data on solid-state disks(SSDs),as much as five to eight times faster.On non-volatile memory express soild-state drives(NVMe SSDs)and non-volatile memory(NVM),MyWAL can improve data writing performance by 10%–30%.Furthermore,the results of YCSB(Yahoo!Cloud Serving Benchmark)show that the latency decreased by 50%compared with SpanDB.展开更多
基金National Hightech R&D Program (2015AA015307)the National Natural Science Foundation of China (Grant Nos. 61702189, 61432006 and 61672232)Youth Science and Technology -“Yang Fan” Program of Shanghai (17YF1427800).
文摘Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.
基金Project supported by the National Key Research and Development Project of China(No.2022YFB2702101)the Shaanxi Province Key Industrial Projects,China(Nos.2021ZDLGY03-02 and 2021ZDLGY03-08)the National Natural Science Foundation of China(No.92152301)。
文摘Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,online analytics,and real-time communication.An LSM tree stores new KV data in the memory and flushes to disk in batches.To prevent data loss in memory if there is an unexpected crash,RocksDB appends updating data in the write-ahead log(WAL)before updating the memory.However,synchronous WAL significantly reduces writing performance.In this paper,we present a new WAL mechanism named MyWAL.It directly manages raw devices(or partitions)instead of saving data on a traditional file system.These can avoid useless metadata updating and write data sequentially on disks.Experimental results show that MyWAL can significantly improve the data writing performance of RocksDB compared to the traditional WAL for small KV data on solid-state disks(SSDs),as much as five to eight times faster.On non-volatile memory express soild-state drives(NVMe SSDs)and non-volatile memory(NVM),MyWAL can improve data writing performance by 10%–30%.Furthermore,the results of YCSB(Yahoo!Cloud Serving Benchmark)show that the latency decreased by 50%compared with SpanDB.