In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT t...With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.展开更多
By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the ...By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.展开更多
Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy...Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.展开更多
In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protectio...In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ...Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.展开更多
Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the s...Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.展开更多
The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Neverth...The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.展开更多
从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一...从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一组带噪声的低维边缘分布,然后使用它们近似输入数据集的联合分布以生成合成数据集。然而,现有方法在计算大量属性对的边缘分布构建概率图模型,以及计算概率图模型中规模较大的属性子集的联合分布时存在局限性。基于此,提出了一种本地差分隐私下的高维数据发布方法PrivHDP(High-dimensional Data Publication Under Local Differential Privacy)。首先,该方法使用随机采样响应代替传统的隐私预算分割策略扰动用户数据,提出自适应边缘分布计算方法计算成对属性的边缘分布构建Markov网。其次,使用新的方法代替互信息度量成对属性间的相关性,引入了基于高通滤波的阈值过滤技术缩减概率图构建过程的搜索空间,结合充分三角化操作和联合树算法获得一组属性子集。最后,基于联合分布分解和冗余消除,计算属性子集上的联合分布。在4个真实数据集上进行实验,结果表明,PrivHDP算法在k-way查询和SVM分类精度方面优于同类算法,验证了所提方法的可用性与高效性。展开更多
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
文摘With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.
基金supported in part by the National Natural Science Foundation of China under Grant No.61972371Youth Innovation Promotion Association of Chinese Academy of Sciences(CAS)under Grant No.Y202093.
文摘By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.
基金supported by the National Key R&D Program of China under Grant 2020YFB1806904by the National Natural Science Foundation of China under Grants 61872416,62171189,62172438 and 62071192+1 种基金by the Fundamental Research Funds for the Central Universities of China under Grant 2019kfyXJJS017,31732111303,31512111310by the special fund for Wuhan Yellow Crane Talents(Excellent Young Scholar).
文摘Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.
基金This work is supported by Major Scientific and Technological Special Project of Guizhou Province(20183001)Research on the education mode for complicate skill students in new media with cross specialty integration(22150117092)+2 种基金Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ014)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ019)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ022).
文摘In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
基金This paper is supported by the Inner Mongolia Natural Science Foundation(Grant Number:2018MS06026,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/)the Science and Technology Program of Inner Mongolia Autonomous Region(Grant Number:2019GG116,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/).
文摘Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.
基金supported by National Natural Science Foundation of China(No.61871037)supported by Natural Science Foundation of Beijing(No.M21035).
文摘Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.
文摘The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.
文摘从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一组带噪声的低维边缘分布,然后使用它们近似输入数据集的联合分布以生成合成数据集。然而,现有方法在计算大量属性对的边缘分布构建概率图模型,以及计算概率图模型中规模较大的属性子集的联合分布时存在局限性。基于此,提出了一种本地差分隐私下的高维数据发布方法PrivHDP(High-dimensional Data Publication Under Local Differential Privacy)。首先,该方法使用随机采样响应代替传统的隐私预算分割策略扰动用户数据,提出自适应边缘分布计算方法计算成对属性的边缘分布构建Markov网。其次,使用新的方法代替互信息度量成对属性间的相关性,引入了基于高通滤波的阈值过滤技术缩减概率图构建过程的搜索空间,结合充分三角化操作和联合树算法获得一组属性子集。最后,基于联合分布分解和冗余消除,计算属性子集上的联合分布。在4个真实数据集上进行实验,结果表明,PrivHDP算法在k-way查询和SVM分类精度方面优于同类算法,验证了所提方法的可用性与高效性。