In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of ...The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of studies have been conducted to improve the performance of the key-value store while maintaining its flexibility.However,the research efforts storing the large-scale values such as multimedia data files(e.g.,images or videos)in the key-value store were limited.In this study,we propose a new key-value store,WR-Store++aiming to store the large-scale values stably.Specifically,it provides a new design of separating data and index by working with the built-in data structure of the Windows operating system and the file system.The utilization of the built-in data structure of the Windows operating system achieves the efficiency of the key-value store and that of the file system extends the limited space of the storage significantly.We also present chunk-based memory management and parallel processing of WR-Store++to further improve its performance in the GET operation.Through the experiments,we show that WR-Store++can store at least 32.74 times larger datasets than the existing baseline key-value store,WR-Store,which has the limitation in storing large-scale data sets.Furthermore,in terms of processing efficiency,we show that WR-Store++outperforms not only WR-Store but also the other state-ofthe-art key-value stores,LevelDB,RocksDB,and BerkeleyDB,for individual key-value operations and mixed workloads.展开更多
The tail latency of end-user requests,which directly impacts the user experience and the revenue,is highly related to its corresponding numerous accesses in key-value stores.The replica selection algorithm is crucial ...The tail latency of end-user requests,which directly impacts the user experience and the revenue,is highly related to its corresponding numerous accesses in key-value stores.The replica selection algorithm is crucial to cut the tail latency of these key-value accesses.Recently,the C3 algorithm,which creatively piggybacks the queue-size of waiting keys from replica servers for the replica selection at clients,is proposed in NSDI 2015.Although C3 improves the tail latency a lot,it suffers from the timeliness issue on the feedback information,which directly influences the replica selection.In this paper,we analysis the evaluation of queuesize of waiting keys of C3,and some findings of queue-size variation were made.It motivate us to propose the Prediction-Based Replica Selection(PRS)algorithm,which predicts the queue-size at replica servers under the poor timeliness condition,instead of utilizing the exponentially weighted moving average of the state piggybacked queue-size as in C3.Consequently,PRS can obtain more accurate queue-size at clients than C3,and thus outperforms C3 in terms of cutting the tail latency.Simulation results confirm the advantage of PRS over C3.展开更多
随着互联网技术的迅猛发展,越来越多的非结构化数据涌入到人们的生活中,为这些数据建立高效的索引面临极大的挑战.键值数据库Key-Value以其结构简单和高扩展性而引起人们的广泛关注,已成为海量数据存储系统中的重要组成部分.由于Key-Va...随着互联网技术的迅猛发展,越来越多的非结构化数据涌入到人们的生活中,为这些数据建立高效的索引面临极大的挑战.键值数据库Key-Value以其结构简单和高扩展性而引起人们的广泛关注,已成为海量数据存储系统中的重要组成部分.由于Key-Value系统对吞吐量要求较高,而基于Flash的固态硬盘(solid state drive,SSD)能够提供很高的随机读性能,在SSD上构建Key-Value系统已成为海量数据存储领域的一大研究热点.鉴于Flash具有非定点更新、寿命有限等特性,基于SSD的KeyValue系统必须针对Flash的特性作专门优化.以一种称为SkimpyStash的基于SSD的Key-Value系统为基础,提出了一种新的Key-Value系统低延迟存储系统(low latency store,LLStore).LLStore使用内存文件映射技术来减少针对SSD的IO请求,除此之外,针对SkimpyStash中低效的压缩策略,提出一种改进方法,可以在少量增加内存开销的情况下极大地减少查询时间.通过与原系统的性能比较实验,LLStore在平均查询时间上可以获得至少12%的加速.展开更多
NoSQL系统因其高性能、高可扩展性的优势在大数据管理中得到广泛应用,而key-value(KV)模型则是NoSQL系统中使用最广泛的一种存储模型.KV型本地存储系统对于以机械磁盘为持久化存储的情形,存在许多性能优化技术,但这些优化技术面对当前...NoSQL系统因其高性能、高可扩展性的优势在大数据管理中得到广泛应用,而key-value(KV)模型则是NoSQL系统中使用最广泛的一种存储模型.KV型本地存储系统对于以机械磁盘为持久化存储的情形,存在许多性能优化技术,但这些优化技术面对当前的硬件发展新趋势,如多核处理器、大内存和低延迟闪存、非易失性内存NVM(Non-Volatile Memory)等,难以充分发挥新硬件的优势,如数据索引、并发控制、事务日志管理等技术在多核架构下存在多核扩展性问题,又如数据存储策略不适应闪存SSD(Solid State Drive)的新存储特性而产生了IO利用率低效的问题.针对多核处理器、大内存和闪存、NVM等硬件发展新趋势,文中面向当前的大数据应用背景,综述了KV型本地存储系统在索引技术、并发控制、事务日志管理和数据放置等核心模块上的最新优化技术和系统研究成果.从处理器、内存和持久化存储的角度概括了KV型本地存储系统当前存在的最优技术,总结了当前研究尚未解决的技术挑战,并对KV型本地存储系统在CPU缓存高效性、事务日志扩展性和高可用性等方面的研究进行了展望.展开更多
The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Neverth...The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.展开更多
As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, ...As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.展开更多
Data storage solutions are a crucial aspect of any application, significantly impacting data management and system performance. This article explores the rationale behind utilizing both SQL and NoSQL databases, addres...Data storage solutions are a crucial aspect of any application, significantly impacting data management and system performance. This article explores the rationale behind utilizing both SQL and NoSQL databases, addressing key questions about when each type is preferable. The background emphasizes the importance of selecting the appropriate database technology to meet specific application requirements. The purpose of this research is to provide a comprehensive guide for choosing between SQL and NoSQL databases based on various factors, including workload characteristics, scalability needs, and consistency requirements. To achieve this, we examine different strategies for implementing SQL and NoSQL databases in large-scale distributed applications and systems. The research method involves a comparative analysis of the features, advantages, and limitations of both database types. We specifically focus on scenarios involving read-heavy versus write-heavy systems and the trade-offs between availability and consistency. The results of this research indicate that SQL databases, with their relational structure and ACID compliance, are ideal for applications requiring complex queries and data integrity. In contrast, NoSQL databases, offering schema flexibility and horizontal scalability, are better suited for managing extensive datasets and high-velocity data ingestion. In conclusion, the selection of a database depends on the specific needs of the application. SQL databases are preferred for transactional systems with complex relationships, while NoSQL databases excel in scenarios demanding flexibility and scalability. The study provides insights into hybrid approaches, leveraging both database types to optimize system performance.展开更多
Nuonuo Kids Bookstore is a dreamlike reading kingdom,established by Zhaci and his team with boundless enthusiasm and considerate design.In such a magical space,laughters and exclamations of kids harmonize into the mos...Nuonuo Kids Bookstore is a dreamlike reading kingdom,established by Zhaci and his team with boundless enthusiasm and considerate design.In such a magical space,laughters and exclamations of kids harmonize into the most beautiful melody.Here,they learn to think,feel,love and also be loved.展开更多
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
文摘The key-value store can provide flexibility of data types because it does not need to specify the data types to be stored in advance and can store any types of data as the value of the key-value pair.Various types of studies have been conducted to improve the performance of the key-value store while maintaining its flexibility.However,the research efforts storing the large-scale values such as multimedia data files(e.g.,images or videos)in the key-value store were limited.In this study,we propose a new key-value store,WR-Store++aiming to store the large-scale values stably.Specifically,it provides a new design of separating data and index by working with the built-in data structure of the Windows operating system and the file system.The utilization of the built-in data structure of the Windows operating system achieves the efficiency of the key-value store and that of the file system extends the limited space of the storage significantly.We also present chunk-based memory management and parallel processing of WR-Store++to further improve its performance in the GET operation.Through the experiments,we show that WR-Store++can store at least 32.74 times larger datasets than the existing baseline key-value store,WR-Store,which has the limitation in storing large-scale data sets.Furthermore,in terms of processing efficiency,we show that WR-Store++outperforms not only WR-Store but also the other state-ofthe-art key-value stores,LevelDB,RocksDB,and BerkeleyDB,for individual key-value operations and mixed workloads.
文摘The tail latency of end-user requests,which directly impacts the user experience and the revenue,is highly related to its corresponding numerous accesses in key-value stores.The replica selection algorithm is crucial to cut the tail latency of these key-value accesses.Recently,the C3 algorithm,which creatively piggybacks the queue-size of waiting keys from replica servers for the replica selection at clients,is proposed in NSDI 2015.Although C3 improves the tail latency a lot,it suffers from the timeliness issue on the feedback information,which directly influences the replica selection.In this paper,we analysis the evaluation of queuesize of waiting keys of C3,and some findings of queue-size variation were made.It motivate us to propose the Prediction-Based Replica Selection(PRS)algorithm,which predicts the queue-size at replica servers under the poor timeliness condition,instead of utilizing the exponentially weighted moving average of the state piggybacked queue-size as in C3.Consequently,PRS can obtain more accurate queue-size at clients than C3,and thus outperforms C3 in terms of cutting the tail latency.Simulation results confirm the advantage of PRS over C3.
文摘随着互联网技术的迅猛发展,越来越多的非结构化数据涌入到人们的生活中,为这些数据建立高效的索引面临极大的挑战.键值数据库Key-Value以其结构简单和高扩展性而引起人们的广泛关注,已成为海量数据存储系统中的重要组成部分.由于Key-Value系统对吞吐量要求较高,而基于Flash的固态硬盘(solid state drive,SSD)能够提供很高的随机读性能,在SSD上构建Key-Value系统已成为海量数据存储领域的一大研究热点.鉴于Flash具有非定点更新、寿命有限等特性,基于SSD的KeyValue系统必须针对Flash的特性作专门优化.以一种称为SkimpyStash的基于SSD的Key-Value系统为基础,提出了一种新的Key-Value系统低延迟存储系统(low latency store,LLStore).LLStore使用内存文件映射技术来减少针对SSD的IO请求,除此之外,针对SkimpyStash中低效的压缩策略,提出一种改进方法,可以在少量增加内存开销的情况下极大地减少查询时间.通过与原系统的性能比较实验,LLStore在平均查询时间上可以获得至少12%的加速.
文摘NoSQL系统因其高性能、高可扩展性的优势在大数据管理中得到广泛应用,而key-value(KV)模型则是NoSQL系统中使用最广泛的一种存储模型.KV型本地存储系统对于以机械磁盘为持久化存储的情形,存在许多性能优化技术,但这些优化技术面对当前的硬件发展新趋势,如多核处理器、大内存和低延迟闪存、非易失性内存NVM(Non-Volatile Memory)等,难以充分发挥新硬件的优势,如数据索引、并发控制、事务日志管理等技术在多核架构下存在多核扩展性问题,又如数据存储策略不适应闪存SSD(Solid State Drive)的新存储特性而产生了IO利用率低效的问题.针对多核处理器、大内存和闪存、NVM等硬件发展新趋势,文中面向当前的大数据应用背景,综述了KV型本地存储系统在索引技术、并发控制、事务日志管理和数据放置等核心模块上的最新优化技术和系统研究成果.从处理器、内存和持久化存储的角度概括了KV型本地存储系统当前存在的最优技术,总结了当前研究尚未解决的技术挑战,并对KV型本地存储系统在CPU缓存高效性、事务日志扩展性和高可用性等方面的研究进行了展望.
文摘The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.
文摘As more and more application systems related to big data were developed, NoSQL (Not Only SQL) database systems are becoming more and more popular. In order to add transaction features for some NoSQL database systems, many scholars have tried different techniques. Unfortunately, there is a lack of research on Redis’s transaction in the existing literatures. This paper proposes a transaction model for key-value NoSQL databases including Redis to make possible allowing users to access data in the ACID (Atomicity, Consistency, Isolation and Durability) way, and this model is vividly called the surfing concurrence transaction model. The architecture, important features and implementation principle are described in detail. The key algorithms also were given in the form of pseudo program code, and the performance also was evaluated. With the proposed model, the transactions of Key-Value NoSQL databases can be performed in a lock free and MVCC (Multi-Version Concurrency Control) free manner. This is the result of further research on the related topic, which fills the gap ignored by relevant scholars in this field to make a little contribution to the further development of NoSQL technology.
文摘Data storage solutions are a crucial aspect of any application, significantly impacting data management and system performance. This article explores the rationale behind utilizing both SQL and NoSQL databases, addressing key questions about when each type is preferable. The background emphasizes the importance of selecting the appropriate database technology to meet specific application requirements. The purpose of this research is to provide a comprehensive guide for choosing between SQL and NoSQL databases based on various factors, including workload characteristics, scalability needs, and consistency requirements. To achieve this, we examine different strategies for implementing SQL and NoSQL databases in large-scale distributed applications and systems. The research method involves a comparative analysis of the features, advantages, and limitations of both database types. We specifically focus on scenarios involving read-heavy versus write-heavy systems and the trade-offs between availability and consistency. The results of this research indicate that SQL databases, with their relational structure and ACID compliance, are ideal for applications requiring complex queries and data integrity. In contrast, NoSQL databases, offering schema flexibility and horizontal scalability, are better suited for managing extensive datasets and high-velocity data ingestion. In conclusion, the selection of a database depends on the specific needs of the application. SQL databases are preferred for transactional systems with complex relationships, while NoSQL databases excel in scenarios demanding flexibility and scalability. The study provides insights into hybrid approaches, leveraging both database types to optimize system performance.
文摘Nuonuo Kids Bookstore is a dreamlike reading kingdom,established by Zhaci and his team with boundless enthusiasm and considerate design.In such a magical space,laughters and exclamations of kids harmonize into the most beautiful melody.Here,they learn to think,feel,love and also be loved.