In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
随着互联网技术的迅猛发展,越来越多的非结构化数据涌入到人们的生活中,为这些数据建立高效的索引面临极大的挑战.键值数据库Key-Value以其结构简单和高扩展性而引起人们的广泛关注,已成为海量数据存储系统中的重要组成部分.由于Key-Va...随着互联网技术的迅猛发展,越来越多的非结构化数据涌入到人们的生活中,为这些数据建立高效的索引面临极大的挑战.键值数据库Key-Value以其结构简单和高扩展性而引起人们的广泛关注,已成为海量数据存储系统中的重要组成部分.由于Key-Value系统对吞吐量要求较高,而基于Flash的固态硬盘(solid state drive,SSD)能够提供很高的随机读性能,在SSD上构建Key-Value系统已成为海量数据存储领域的一大研究热点.鉴于Flash具有非定点更新、寿命有限等特性,基于SSD的KeyValue系统必须针对Flash的特性作专门优化.以一种称为SkimpyStash的基于SSD的Key-Value系统为基础,提出了一种新的Key-Value系统低延迟存储系统(low latency store,LLStore).LLStore使用内存文件映射技术来减少针对SSD的IO请求,除此之外,针对SkimpyStash中低效的压缩策略,提出一种改进方法,可以在少量增加内存开销的情况下极大地减少查询时间.通过与原系统的性能比较实验,LLStore在平均查询时间上可以获得至少12%的加速.展开更多
The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of ...The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of studies have been conducted to improve the performance of the key-value store while maintaining its flexibility.However,the research efforts storing the large-scale values such as multimedia data files(e.g.,images or videos)in the key-value store were limited.In this study,we propose a new key-value store,WR-Store++aiming to store the large-scale values stably.Specifically,it provides a new design of separating data and index by working with the built-in data structure of the Windows operating system and the file system.The utilization of the built-in data structure of the Windows operating system achieves the efficiency of the key-value store and that of the file system extends the limited space of the storage significantly.We also present chunk-based memory management and parallel processing of WR-Store++to further improve its performance in the GET operation.Through the experiments,we show that WR-Store++can store at least 32.74 times larger datasets than the existing baseline key-value store,WR-Store,which has the limitation in storing large-scale data sets.Furthermore,in terms of processing efficiency,we show that WR-Store++outperforms not only WR-Store but also the other state-ofthe-art key-value stores,LevelDB,RocksDB,and BerkeleyDB,for individual key-value operations and mixed workloads.展开更多
The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Neverth...The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.展开更多
As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, ...As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.展开更多
The tail latency of end-user requests,which directly impacts the user experience and the revenue,is highly related to its corresponding numerous accesses in key-value stores.The replica selection algorithm is crucial ...The tail latency of end-user requests,which directly impacts the user experience and the revenue,is highly related to its corresponding numerous accesses in key-value stores.The replica selection algorithm is crucial to cut the tail latency of these key-value accesses.Recently,the C3 algorithm,which creatively piggybacks the queue-size of waiting keys from replica servers for the replica selection at clients,is proposed in NSDI 2015.Although C3 improves the tail latency a lot,it suffers from the timeliness issue on the feedback information,which directly influences the replica selection.In this paper,we analysis the evaluation of queuesize of waiting keys of C3,and some findings of queue-size variation were made.It motivate us to propose the Prediction-Based Replica Selection(PRS)algorithm,which predicts the queue-size at replica servers under the poor timeliness condition,instead of utilizing the exponentially weighted moving average of the state piggybacked queue-size as in C3.Consequently,PRS can obtain more accurate queue-size at clients than C3,and thus outperforms C3 in terms of cutting the tail latency.Simulation results confirm the advantage of PRS over C3.展开更多
Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,on...Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,online analytics,and real-time communication.An LSM tree stores new KV data in the memory and flushes to disk in batches.To prevent data loss in memory if there is an unexpected crash,RocksDB appends updating data in the write-ahead log(WAL)before updating the memory.However,synchronous WAL significantly reduces writing performance.In this paper,we present a new WAL mechanism named MyWAL.It directly manages raw devices(or partitions)instead of saving data on a traditional file system.These can avoid useless metadata updating and write data sequentially on disks.Experimental results show that MyWAL can significantly improve the data writing performance of RocksDB compared to the traditional WAL for small KV data on solid-state disks(SSDs),as much as five to eight times faster.On non-volatile memory express soild-state drives(NVMe SSDs)and non-volatile memory(NVM),MyWAL can improve data writing performance by 10%–30%.Furthermore,the results of YCSB(Yahoo!Cloud Serving Benchmark)show that the latency decreased by 50%compared with SpanDB.展开更多
The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational d...The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.展开更多
Memory-based key-value cache systems, such as Memcached and Redis, have become indispensable components of data center infrastructures and have been used to cache performance-critical data to avoid expensive back-end ...Memory-based key-value cache systems, such as Memcached and Redis, have become indispensable components of data center infrastructures and have been used to cache performance-critical data to avoid expensive back-end database accesses. As the memory is usually not large enough to hold all the items, cache replacement must be performed to evict some cached items to make room for the newly coming items when there is no free space. Many real-world workloads target small items and have frequent bursts of scans (a scan is a sequence of one-time access requests). The commonly used LRU policy does not work well under such workloads since LRU needs a large amount of metadata and tends to discard hot items with scans. Small decreases in hit ratio can result in large end-to-end losses in these systems. This paper presents MemSC, which is a scan-resistant and compact cache replacement framework for Memcached. MemSC assigns a multi-granularity reference flag for each item, which requires only a few bits (two bits are enough for general use) per item to support scanresistant cache replacement policies. To evaluate MemSC, we implement three representative cache replacement policies (MemSC-HM, MemSC-LH, and MemSC-LF) on MemSC and test them using various workloads. The experimental results show that MemSC outperforms prior techniques. Compared with the optimized LRU policy in Memcached, MemSC-LH reduces the cache miss ratio and the memory usage of the resulting system by up to 23% and 14% respectively.展开更多
Large-scale key-value stores are widely used in many Web-based systems to store huge amount of data as(key, value) pairs. In order to reduce the latency of accessing such(key, value) pairs, an in-memory cache system i...Large-scale key-value stores are widely used in many Web-based systems to store huge amount of data as(key, value) pairs. In order to reduce the latency of accessing such(key, value) pairs, an in-memory cache system is usually deployed between the front-end Web system and the back-end database system. In practice, a cache system may consist of a number of server nodes, and fault tolerance is a critical feature to maintain the latency Service-Level Agreements(SLAs). In this paper, we present the design, implementation, analysis, and evaluation of R-Memcached, a reliable in-memory key-value cache system that is built on top of the popular Memcached software. R-Memcached exploits coding techniques to achieve reliability, and can tolerate up to two node failures.Our experimental results show that R-Memcached can maintain very good latency and throughput performance even during the period of node failures.展开更多
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
文摘随着互联网技术的迅猛发展,越来越多的非结构化数据涌入到人们的生活中,为这些数据建立高效的索引面临极大的挑战.键值数据库Key-Value以其结构简单和高扩展性而引起人们的广泛关注,已成为海量数据存储系统中的重要组成部分.由于Key-Value系统对吞吐量要求较高,而基于Flash的固态硬盘(solid state drive,SSD)能够提供很高的随机读性能,在SSD上构建Key-Value系统已成为海量数据存储领域的一大研究热点.鉴于Flash具有非定点更新、寿命有限等特性,基于SSD的KeyValue系统必须针对Flash的特性作专门优化.以一种称为SkimpyStash的基于SSD的Key-Value系统为基础,提出了一种新的Key-Value系统低延迟存储系统(low latency store,LLStore).LLStore使用内存文件映射技术来减少针对SSD的IO请求,除此之外,针对SkimpyStash中低效的压缩策略,提出一种改进方法,可以在少量增加内存开销的情况下极大地减少查询时间.通过与原系统的性能比较实验,LLStore在平均查询时间上可以获得至少12%的加速.
文摘The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of studies have been conducted to improve the performance of the key-value store while maintaining its flexibility.However,the research efforts storing the large-scale values such as multimedia data files(e.g.,images or videos)in the key-value store were limited.In this study,we propose a new key-value store,WR-Store++aiming to store the large-scale values stably.Specifically,it provides a new design of separating data and index by working with the built-in data structure of the Windows operating system and the file system.The utilization of the built-in data structure of the Windows operating system achieves the efficiency of the key-value store and that of the file system extends the limited space of the storage significantly.We also present chunk-based memory management and parallel processing of WR-Store++to further improve its performance in the GET operation.Through the experiments,we show that WR-Store++can store at least 32.74 times larger datasets than the existing baseline key-value store,WR-Store,which has the limitation in storing large-scale data sets.Furthermore,in terms of processing efficiency,we show that WR-Store++outperforms not only WR-Store but also the other state-ofthe-art key-value stores,LevelDB,RocksDB,and BerkeleyDB,for individual key-value operations and mixed workloads.
文摘The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.
文摘As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.
文摘The tail latency of end-user requests,which directly impacts the user experience and the revenue,is highly related to its corresponding numerous accesses in key-value stores.The replica selection algorithm is crucial to cut the tail latency of these key-value accesses.Recently,the C3 algorithm,which creatively piggybacks the queue-size of waiting keys from replica servers for the replica selection at clients,is proposed in NSDI 2015.Although C3 improves the tail latency a lot,it suffers from the timeliness issue on the feedback information,which directly influences the replica selection.In this paper,we analysis the evaluation of queuesize of waiting keys of C3,and some findings of queue-size variation were made.It motivate us to propose the Prediction-Based Replica Selection(PRS)algorithm,which predicts the queue-size at replica servers under the poor timeliness condition,instead of utilizing the exponentially weighted moving average of the state piggybacked queue-size as in C3.Consequently,PRS can obtain more accurate queue-size at clients than C3,and thus outperforms C3 in terms of cutting the tail latency.Simulation results confirm the advantage of PRS over C3.
基金Project supported by the National Key Research and Development Project of China(No.2022YFB2702101)the Shaanxi Province Key Industrial Projects,China(Nos.2021ZDLGY03-02 and 2021ZDLGY03-08)the National Natural Science Foundation of China(No.92152301)。
文摘Based on a log-structured merge(LSM)tree,the key-value(KV)storage system can provide high reading performance and optimize random writing performance.It is widely used in modern data storage systems like e-commerce,online analytics,and real-time communication.An LSM tree stores new KV data in the memory and flushes to disk in batches.To prevent data loss in memory if there is an unexpected crash,RocksDB appends updating data in the write-ahead log(WAL)before updating the memory.However,synchronous WAL significantly reduces writing performance.In this paper,we present a new WAL mechanism named MyWAL.It directly manages raw devices(or partitions)instead of saving data on a traditional file system.These can avoid useless metadata updating and write data sequentially on disks.Experimental results show that MyWAL can significantly improve the data writing performance of RocksDB compared to the traditional WAL for small KV data on solid-state disks(SSDs),as much as five to eight times faster.On non-volatile memory express soild-state drives(NVMe SSDs)and non-volatile memory(NVM),MyWAL can improve data writing performance by 10%–30%.Furthermore,the results of YCSB(Yahoo!Cloud Serving Benchmark)show that the latency decreased by 50%compared with SpanDB.
文摘The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.
文摘Memory-based key-value cache systems, such as Memcached and Redis, have become indispensable components of data center infrastructures and have been used to cache performance-critical data to avoid expensive back-end database accesses. As the memory is usually not large enough to hold all the items, cache replacement must be performed to evict some cached items to make room for the newly coming items when there is no free space. Many real-world workloads target small items and have frequent bursts of scans (a scan is a sequence of one-time access requests). The commonly used LRU policy does not work well under such workloads since LRU needs a large amount of metadata and tends to discard hot items with scans. Small decreases in hit ratio can result in large end-to-end losses in these systems. This paper presents MemSC, which is a scan-resistant and compact cache replacement framework for Memcached. MemSC assigns a multi-granularity reference flag for each item, which requires only a few bits (two bits are enough for general use) per item to support scanresistant cache replacement policies. To evaluate MemSC, we implement three representative cache replacement policies (MemSC-HM, MemSC-LH, and MemSC-LF) on MemSC and test them using various workloads. The experimental results show that MemSC outperforms prior techniques. Compared with the optimized LRU policy in Memcached, MemSC-LH reduces the cache miss ratio and the memory usage of the resulting system by up to 23% and 14% respectively.
基金supported in part by Hong Kong GRF grant HKBU 210412 and HKBU grant FRG2/14-15/059
文摘Large-scale key-value stores are widely used in many Web-based systems to store huge amount of data as(key, value) pairs. In order to reduce the latency of accessing such(key, value) pairs, an in-memory cache system is usually deployed between the front-end Web system and the back-end database system. In practice, a cache system may consist of a number of server nodes, and fault tolerance is a critical feature to maintain the latency Service-Level Agreements(SLAs). In this paper, we present the design, implementation, analysis, and evaluation of R-Memcached, a reliable in-memory key-value cache system that is built on top of the popular Memcached software. R-Memcached exploits coding techniques to achieve reliability, and can tolerate up to two node failures.Our experimental results show that R-Memcached can maintain very good latency and throughput performance even during the period of node failures.