In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT t...With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.展开更多
Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy...Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.展开更多
In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protectio...In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.展开更多
By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the ...By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.展开更多
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ...Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.展开更多
Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the s...Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.展开更多
The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Neverth...The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"...Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items.The current state-of-the-art approach,namely set-value itemset mining(SVSM),must balance variance and bias to achieve accurate results.Thus,an unbiased FIM approach with lower variance is highly promising.To narrow this gap,we propose an Item-Level LDP frequency oracle approach,named the Integrated-with-Hadamard-Transform-Based Frequency Oracle(IHFO).For the first time,Hadamard encoding is introduced to a set of values to encode all items into a fixed vector,and perturbation can be subsequently applied to the vector.An FIM approach,called optimized united itemset mining(O-UISM),is pro-posed to combine the padding-and-sampling-based frequency oracle(PSFO)and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies.Finally,we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.展开更多
The fast development of the Internet and mobile devices results in a crowdsensing business model,where individuals(users)are willing to contribute their data to help the institution(data collector)analyze and release ...The fast development of the Internet and mobile devices results in a crowdsensing business model,where individuals(users)are willing to contribute their data to help the institution(data collector)analyze and release useful information.However,the reveal of personal data will bring huge privacy threats to users,which will impede the wide application of the crowdsensing model.To settle the problem,the definition of local differential privacy(LDP)is proposed.Afterwards,to respond to the varied privacy preference of users,resear-chers propose a new model,i.e.,personalized local differential privacy(PLDP),which allow users to specify their own privacy parameters.In this paper,we focus on a basic task of calculating the mean value over a single numeric attribute with PLDP.Based on the previous schemes for mean estimation under LDP,we employ PLDP model to design novel schemes(LAP,DCP,PWP)to provide personalized privacy for each user.We then theoretically analysis the worst-case variance of three proposed schemes and conduct experiments on synthetic and real datasets to evaluate the performance of three methods.The theoretical and experimental results show the optimality of PWP in the low privacy regime and a slight advantage of DCP in the high privacy regime.展开更多
Recently,local differential privacy(LDP)has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees.Existing LDP-based mechanisms mainly focus on learning statistical infor...Recently,local differential privacy(LDP)has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees.Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data.For the first time in the literature,we use LDP for distance estimation between distributed data to support more complicated data analysis.Specifically,we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space.We also present an optimization strategy for reducing privacy leakage in the high-dimensional space.The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments.As study cases,we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering.Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.展开更多
Local differential privacy(LDP),which is a technique that employs unbiased statistical estimations instead of real data,is usually adopted in data collection,as it can protect every user’s privacy and prevent the lea...Local differential privacy(LDP),which is a technique that employs unbiased statistical estimations instead of real data,is usually adopted in data collection,as it can protect every user’s privacy and prevent the leakage of sensitive information.The segment pairs method(SPM),multiple-channel method(MCM)and prefix extending method(PEM)are three known LDP protocols for heavy hitter identification as well as the frequency oracle(FO)problem with large domains.However,the low scalability of these three LDP algorithms often limits their application.Specifically,communication and computation strongly affect their efficiency.Moreover,excessive grouping or sharing of privacy budgets makes the results inaccurate.To address the abovementioned problems,this study proposes independent channel(IC)and mixed independent channel(MIC),which are efficient LDP protocols for FO with a large domains.We design a flexible method for splitting a large domain to reduce the number of sub-domains.Further,we employ the false positive rate with interaction to obtain an accurate estimation.Numerical experiments demonstrate that IC outperforms all the existing solutions under the same privacy guarantee while MIC performs well under a small privacy budget with the lowest communication cost.展开更多
With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market compet...With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition.K-means clustering has been widely used for cluster analysis in real life.However,these analyses are based on users’data,which disclose users’privacy.Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis.However,existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee.To solve this problem,we propose a novel method that provides local distance privacy for users who participate in the clustering analysis.Instead of making the users’records in-distinguish from each other in high-dimensional space,we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other.To be specific,we generate a noisy distance first and then synthesize the high-dimensional data record.We propose a Bounded Laplace Method(BLM)and a Cluster Indistinguishable Method(CIM)to sample such a noisy distance,which satisfies the local differential privacy guarantee and local dE-privacy guarantee,respectively.Furthermore,we introduce a way to generate synthetic data records in high-dimensional space.Our experimental evaluation results show that our methods outperform the traditional methods significantly.展开更多
To solve the privacy leakage problem of truck trajectories in intelligent logistics,this paper proposes a quadtreebased personalized joint location perturbation(QPJLP)algorithm using location generalization and local ...To solve the privacy leakage problem of truck trajectories in intelligent logistics,this paper proposes a quadtreebased personalized joint location perturbation(QPJLP)algorithm using location generalization and local differential privacy(LDP)techniques.Firstly,a flexible position encoding mechanism based on the spatial quadtree indexing is designed,and the length of the encoding can be adjusted freely according to data availability.Secondly,to meet the privacy needs of different locations of users,location categories are introduced to classify locations as sensitive and ordinary locations.Finally,the truck invokes the corresponding mechanism in the QPJLP algorithm to locally perturb the code according to the location category,allowing the protection of non-sensitive locations to be reduced without weakening the protection of sensitive locations,thereby improving data availability.Simulation experiments demonstrate that the proposed algorithm effectively meets the personalized trajectory privacy requirements while also exhibiting good performance in trajectory proportion estimation and top-k classification.展开更多
In view of the privacy security issues such as location information leakage in the interaction process between the base station and the sensor nodes in the sensor-cloud system, a base station location privacy protecti...In view of the privacy security issues such as location information leakage in the interaction process between the base station and the sensor nodes in the sensor-cloud system, a base station location privacy protection algorithm based on local differential privacy(LDP) is proposed. Firstly, through the local obfuscation algorithm(LOA), the base station can get the data of the real location and the pseudo location by flipping a coin, and then send the data to the fog layer, then the obfuscation location domain set is obtained. Secondly, in order to reconstruct the location distribution of the real location and the pseudo location in the base station, the location domain of the base station is divided into several decentralized sub-regions, and a privacy location reconstruction algorithm(PLRA) is performed in each sub-region. Finally, the base station correlates the location information of each sub-region, and then uploads the data information containing the disturbance location to the fog node layer. The simulation results show that compared with the existing base station location anonymity and security technique(BLAST) algorithm, the proposed method not only reduce the algorithm’s running time and network delay, but also improve the data availability. So the proposed method can protect the location privacy of the base station more safely and efficiently.展开更多
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
文摘With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.
基金supported by the National Key R&D Program of China under Grant 2020YFB1806904by the National Natural Science Foundation of China under Grants 61872416,62171189,62172438 and 62071192+1 种基金by the Fundamental Research Funds for the Central Universities of China under Grant 2019kfyXJJS017,31732111303,31512111310by the special fund for Wuhan Yellow Crane Talents(Excellent Young Scholar).
文摘Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.
基金This work is supported by Major Scientific and Technological Special Project of Guizhou Province(20183001)Research on the education mode for complicate skill students in new media with cross specialty integration(22150117092)+2 种基金Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ014)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ019)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ022).
文摘In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.
基金supported in part by the National Natural Science Foundation of China under Grant No.61972371Youth Innovation Promotion Association of Chinese Academy of Sciences(CAS)under Grant No.Y202093.
文摘By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.
基金This paper is supported by the Inner Mongolia Natural Science Foundation(Grant Number:2018MS06026,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/)the Science and Technology Program of Inner Mongolia Autonomous Region(Grant Number:2019GG116,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/).
文摘Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.
基金supported by National Natural Science Foundation of China(No.61871037)supported by Natural Science Foundation of Beijing(No.M21035).
文摘Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.
文摘The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
基金supported by the National Natural Science Foundation of China under Grant Nos.61772537,61772536,62072460,62076245,and 62172424the National Key Research and Development Program of China under Grant No.2018YFB1004401Beijing Natural Science Foundation under Grant No.4212022.
文摘Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee privacy.Most current approaches to FIM under LDP add"padding and sampling"steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items.The current state-of-the-art approach,namely set-value itemset mining(SVSM),must balance variance and bias to achieve accurate results.Thus,an unbiased FIM approach with lower variance is highly promising.To narrow this gap,we propose an Item-Level LDP frequency oracle approach,named the Integrated-with-Hadamard-Transform-Based Frequency Oracle(IHFO).For the first time,Hadamard encoding is introduced to a set of values to encode all items into a fixed vector,and perturbation can be subsequently applied to the vector.An FIM approach,called optimized united itemset mining(O-UISM),is pro-posed to combine the padding-and-sampling-based frequency oracle(PSFO)and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies.Finally,we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.
基金the National Key Research and Development Program of China(2020YFB1005900)the Research Fund of Guangxi Key Laboratory of Trusted Software(kx202034)+1 种基金the Team Project of Collaborative Innovation in Universities of Gansu Province(2017C-16)Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘The fast development of the Internet and mobile devices results in a crowdsensing business model,where individuals(users)are willing to contribute their data to help the institution(data collector)analyze and release useful information.However,the reveal of personal data will bring huge privacy threats to users,which will impede the wide application of the crowdsensing model.To settle the problem,the definition of local differential privacy(LDP)is proposed.Afterwards,to respond to the varied privacy preference of users,resear-chers propose a new model,i.e.,personalized local differential privacy(PLDP),which allow users to specify their own privacy parameters.In this paper,we focus on a basic task of calculating the mean value over a single numeric attribute with PLDP.Based on the previous schemes for mean estimation under LDP,we employ PLDP model to design novel schemes(LAP,DCP,PWP)to provide personalized privacy for each user.We then theoretically analysis the worst-case variance of three proposed schemes and conduct experiments on synthetic and real datasets to evaluate the performance of three methods.The theoretical and experimental results show the optimality of PWP in the low privacy regime and a slight advantage of DCP in the high privacy regime.
基金supported by National Key Research and Development Program of China(Nos.2019QY1402 and 2016YFB0800901)。
文摘Recently,local differential privacy(LDP)has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees.Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data.For the first time in the literature,we use LDP for distance estimation between distributed data to support more complicated data analysis.Specifically,we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space.We also present an optimization strategy for reducing privacy leakage in the high-dimensional space.The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments.As study cases,we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering.Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.
基金This work was supported by the National Key R&D Program of China(2018YFB1004401)the National Natural Science Foundation of China(NSFC)(Grant Nos.61772537,61772536,62072460,and 62076245)Beijing Natural Science Foundation(4212022).
文摘Local differential privacy(LDP),which is a technique that employs unbiased statistical estimations instead of real data,is usually adopted in data collection,as it can protect every user’s privacy and prevent the leakage of sensitive information.The segment pairs method(SPM),multiple-channel method(MCM)and prefix extending method(PEM)are three known LDP protocols for heavy hitter identification as well as the frequency oracle(FO)problem with large domains.However,the low scalability of these three LDP algorithms often limits their application.Specifically,communication and computation strongly affect their efficiency.Moreover,excessive grouping or sharing of privacy budgets makes the results inaccurate.To address the abovementioned problems,this study proposes independent channel(IC)and mixed independent channel(MIC),which are efficient LDP protocols for FO with a large domains.We design a flexible method for splitting a large domain to reduce the number of sub-domains.Further,we employ the false positive rate with interaction to obtain an accurate estimation.Numerical experiments demonstrate that IC outperforms all the existing solutions under the same privacy guarantee while MIC performs well under a small privacy budget with the lowest communication cost.
文摘With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition.K-means clustering has been widely used for cluster analysis in real life.However,these analyses are based on users’data,which disclose users’privacy.Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis.However,existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee.To solve this problem,we propose a novel method that provides local distance privacy for users who participate in the clustering analysis.Instead of making the users’records in-distinguish from each other in high-dimensional space,we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other.To be specific,we generate a noisy distance first and then synthesize the high-dimensional data record.We propose a Bounded Laplace Method(BLM)and a Cluster Indistinguishable Method(CIM)to sample such a noisy distance,which satisfies the local differential privacy guarantee and local dE-privacy guarantee,respectively.Furthermore,we introduce a way to generate synthetic data records in high-dimensional space.Our experimental evaluation results show that our methods outperform the traditional methods significantly.
基金Key Scientific Research Projects of Colleges and Universities in Henan Province(23A520033)Doctoral Scientific Fund of Henan Polytechnic University(B2022-16).
文摘To solve the privacy leakage problem of truck trajectories in intelligent logistics,this paper proposes a quadtreebased personalized joint location perturbation(QPJLP)algorithm using location generalization and local differential privacy(LDP)techniques.Firstly,a flexible position encoding mechanism based on the spatial quadtree indexing is designed,and the length of the encoding can be adjusted freely according to data availability.Secondly,to meet the privacy needs of different locations of users,location categories are introduced to classify locations as sensitive and ordinary locations.Finally,the truck invokes the corresponding mechanism in the QPJLP algorithm to locally perturb the code according to the location category,allowing the protection of non-sensitive locations to be reduced without weakening the protection of sensitive locations,thereby improving data availability.Simulation experiments demonstrate that the proposed algorithm effectively meets the personalized trajectory privacy requirements while also exhibiting good performance in trajectory proportion estimation and top-k classification.
基金supported by the National Natural Science Foundation of China (61202458, 61403109)the Natural Science Foundation of Heilongjiang Province of China(LH2020F034)the Harbin Science and Technology Innovation Research Funds (2016RAQXJ036)。
文摘In view of the privacy security issues such as location information leakage in the interaction process between the base station and the sensor nodes in the sensor-cloud system, a base station location privacy protection algorithm based on local differential privacy(LDP) is proposed. Firstly, through the local obfuscation algorithm(LOA), the base station can get the data of the real location and the pseudo location by flipping a coin, and then send the data to the fog layer, then the obfuscation location domain set is obtained. Secondly, in order to reconstruct the location distribution of the real location and the pseudo location in the base station, the location domain of the base station is divided into several decentralized sub-regions, and a privacy location reconstruction algorithm(PLRA) is performed in each sub-region. Finally, the base station correlates the location information of each sub-region, and then uploads the data information containing the disturbance location to the fog node layer. The simulation results show that compared with the existing base station location anonymity and security technique(BLAST) algorithm, the proposed method not only reduce the algorithm’s running time and network delay, but also improve the data availability. So the proposed method can protect the location privacy of the base station more safely and efficiently.