As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local da...As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.展开更多
In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactic...The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactics, could make these models inadvertently disclose their training data. Such disclosures could compromise personal identifiable information, posing significant privacy risks. In this paper, we proposed a novel multi-faceted approach called Whispered Tuning to address privacy leaks in large language models (LLMs). We integrated a PII redaction model, differential privacy techniques, and an output filter into the LLM fine-tuning process to enhance confidentiality. Additionally, we introduced novel ideas like the Epsilon Dial for adjustable privacy budgeting for differentiated Training Phases per data handler role. Through empirical validation, including attacks on non-private models, we demonstrated the robustness of our proposed solution SecureNLP in safeguarding privacy without compromising utility. This pioneering methodology significantly fortified LLMs against privacy infringements, enabling responsible adoption across sectors.展开更多
By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the ...By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.展开更多
With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT t...With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.展开更多
Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy...Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.展开更多
In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protectio...In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.展开更多
Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the u...Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.展开更多
Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy inform...Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy information. The direct release of social network data will cause the disclosure of privacy information. Aiming at the dynamic characteristics of social network data release, a new dynamic social network data publishing method based on differential privacy was proposed. This method was consistent with differential privacy. It is named DDPA (Dynamic Differential Privacy Algorithm). DDPA algorithm is an improvement of privacy protection algorithm in static social network data publishing. DDPA adds noise which follows Laplace to network edge weights. DDPA identifies the edge weight information that changes as the number of iterations increases, adding the privacy protection budget. Through experiments on real data sets, the results show that the DDPA algorithm satisfies the user’s privacy requirement in social network. DDPA reduces the execution time brought by iterations and reduces the information loss rate of graph structure.展开更多
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ...Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.展开更多
In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajecto...In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.展开更多
Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physic...Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physical isolation of data, the local data of federated learning clients are still at risk of leakage under the attack of malicious individuals. For this reason, combining data protection techniques (e.g., differential privacy techniques) with federated learning is a sure way to further improve the data security of federated learning models. In this survey, we review recent advances in the research of differentially-private federated learning models. First, we introduce the workflow of federated learning and the theoretical basis of differential privacy. Then, we review three differentially-private federated learning paradigms: central differential privacy, local differential privacy, and distributed differential privacy. After this, we review the algorithmic optimization and communication cost optimization of federated learning models with differential privacy. Finally, we review the applications of federated learning models with differential privacy in various domains. By systematically summarizing the existing research, we propose future research opportunities.展开更多
Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the s...Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.展开更多
To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing...To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing scheme based on C/S communication mode,and constructed a federated learning architecture that uses differential privacy technology to protect training parameters.Clients do not need to share local data,and they only need to upload the trained model parameters to achieve data sharing.In the process of training,a distributed parameter update mechanism is introduced.The server is mainly responsible for issuing training commands and parameters,and aggregating the local model parameters uploaded by the clients.The client mainly uses the stochastic gradient descent algorithm for gradient trimming,updates,and transmits the trained model parameters back to the server after differential processing.To test the performance of the scheme,in the application scenario where many medical institutions jointly train the disease detection system,the model is tested from multiple perspectives by taking medical data as an example.From the testing results,we can know that for this specific test dataset,when the parameters are properly configured,the lowest prediction accuracy rate is 90.261%and the highest accuracy rate is up to 94.352.It shows that the performance of the model is good.The results also show that this scheme realizes data sharing while protecting data privacy,completes accurate prediction of diseases,and has a good effect.展开更多
There are growing concerns surrounding the data security of social networks because large amount of user information and sensitive data are collected. Differential privacy is an effective method for privacy protection...There are growing concerns surrounding the data security of social networks because large amount of user information and sensitive data are collected. Differential privacy is an effective method for privacy protection that can provide rigorous and quantitative protection. Concerning the application of differential privacy in social networks,this paper analyzes current trends of research and provides some background information including privacy protection standards and noise mechanisms.Focusing on the privacy protection of social network data publishing,a graph-publishing model is designed to provide differential privacy in social networks via three steps: Firstly,according to the features of social network where two nodes that possess certain common properties are associated with a higher probability,a raw graph is divided into several disconnected sub-graphs,and correspondingly dense adjacent matrixes and the number of bridges are obtained. Secondly,taking the advantage of quad-trees,dense region exploration of the adjacent matrixes is conducted. Finally,using an exponential mechanism and leaf nodes of quad-trees,an adjacent matrix of the sanitized graph is reconstructed. In addition,a set of experiments is conducted to evaluate its feasibility,availability and strengths using three analysis techniques: degree distribution,shortest path,and clustering coefficients.展开更多
The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Neverth...The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.展开更多
Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the X...Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the XGBoost can improve the classification accuracy while protecting privacy information.When using CART regression tree to build a single decision tree,noise is added according to Laplace mechanism.Compared with random forest algorithm,this algorithm can reduce computation cost and prevent overfitting to a certain extent.The experimental results show that the proposed algorithm is more effective than other traditional algorithms while protecting the privacy information in training data.展开更多
Sharing data while protecting privacy in the industrial Internet is a significant challenge.Traditional machine learning methods require a combination of all data for training;however,this approach can be limited by d...Sharing data while protecting privacy in the industrial Internet is a significant challenge.Traditional machine learning methods require a combination of all data for training;however,this approach can be limited by data availability and privacy concerns.Federated learning(FL)has gained considerable attention because it allows for decentralized training on multiple local datasets.However,the training data collected by data providers are often non-independent and identically distributed(non-IID),resulting in poor FL performance.This paper proposes a privacy-preserving approach for sharing non-IID data in the industrial Internet using an FL approach based on blockchain technology.To overcome the problem of non-IID data leading to poor training accuracy,we propose dynamically updating the local model based on the divergence of the global and local models.This approach can significantly improve the accuracy of FL training when there is relatively large dispersion.In addition,we design a dynamic gradient clipping algorithm to alleviate the influence of noise on the model accuracy to reduce potential privacy leakage caused by sharing model parameters.Finally,we evaluate the performance of the proposed scheme using commonly used open-source image datasets.The simulation results demonstrate that our method can significantly enhance the accuracy while protecting privacy and maintaining efficiency,thereby providing a new solution to data-sharing and privacy-protection challenges in the industrial Internet.展开更多
With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most...With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most elementary learning models of machine learning.Privacy issues surrounding SVM classifier training have attracted increasing attention.In this paper,we investigate Differential Privacy-compliant Federated Machine Learning with Dimensionality Reduction,called FedDPDR-DPML,which greatly improves data utility while providing strong privacy guarantees.Considering in distributed learning scenarios,multiple participants usually hold unbalanced or small amounts of data.Therefore,FedDPDR-DPML enables multiple participants to collaboratively learn a global model based on weighted model averaging and knowledge aggregation and then the server distributes the global model to each participant to improve local data utility.Aiming at high-dimensional data,we adopt differential privacy in both the principal component analysis(PCA)-based dimensionality reduction phase and SVM classifiers training phase,which improves model accuracy while achieving strict differential privacy protection.Besides,we train Differential privacy(DP)-compliant SVM classifiers by adding noise to the objective function itself,thus leading to better data utility.Extensive experiments on three high-dimensional datasets demonstrate that FedDPDR-DPML can achieve high accuracy while ensuring strong privacy protection.展开更多
基金the Sichuan Provincial Science and Technology Department Project under Grant 2019YFN0104the Yibin Science and Technology Plan Project under Grant 2021GY008the Sichuan University of Science and Engineering Postgraduate Innovation Fund Project under Grant Y2022154.
文摘As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
文摘The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactics, could make these models inadvertently disclose their training data. Such disclosures could compromise personal identifiable information, posing significant privacy risks. In this paper, we proposed a novel multi-faceted approach called Whispered Tuning to address privacy leaks in large language models (LLMs). We integrated a PII redaction model, differential privacy techniques, and an output filter into the LLM fine-tuning process to enhance confidentiality. Additionally, we introduced novel ideas like the Epsilon Dial for adjustable privacy budgeting for differentiated Training Phases per data handler role. Through empirical validation, including attacks on non-private models, we demonstrated the robustness of our proposed solution SecureNLP in safeguarding privacy without compromising utility. This pioneering methodology significantly fortified LLMs against privacy infringements, enabling responsible adoption across sectors.
基金supported in part by the National Natural Science Foundation of China under Grant No.61972371Youth Innovation Promotion Association of Chinese Academy of Sciences(CAS)under Grant No.Y202093.
文摘By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.
文摘With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.
基金supported by the National Key R&D Program of China under Grant 2020YFB1806904by the National Natural Science Foundation of China under Grants 61872416,62171189,62172438 and 62071192+1 种基金by the Fundamental Research Funds for the Central Universities of China under Grant 2019kfyXJJS017,31732111303,31512111310by the special fund for Wuhan Yellow Crane Talents(Excellent Young Scholar).
文摘Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.
基金This work is supported by Major Scientific and Technological Special Project of Guizhou Province(20183001)Research on the education mode for complicate skill students in new media with cross specialty integration(22150117092)+2 种基金Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ014)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ019)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ022).
文摘In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.
基金supported in part by National Natural Science Foundation of China(No.61672106)in part by Natural Science Foundation of Beijing,China(L192023)in part by the project of promoting the Classified Development of Beijing Information Science and Technology University(No.5112211038,5112211039)。
文摘Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.
文摘Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy information. The direct release of social network data will cause the disclosure of privacy information. Aiming at the dynamic characteristics of social network data release, a new dynamic social network data publishing method based on differential privacy was proposed. This method was consistent with differential privacy. It is named DDPA (Dynamic Differential Privacy Algorithm). DDPA algorithm is an improvement of privacy protection algorithm in static social network data publishing. DDPA adds noise which follows Laplace to network edge weights. DDPA identifies the edge weight information that changes as the number of iterations increases, adding the privacy protection budget. Through experiments on real data sets, the results show that the DDPA algorithm satisfies the user’s privacy requirement in social network. DDPA reduces the execution time brought by iterations and reduces the information loss rate of graph structure.
基金This paper is supported by the Inner Mongolia Natural Science Foundation(Grant Number:2018MS06026,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/)the Science and Technology Program of Inner Mongolia Autonomous Region(Grant Number:2019GG116,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/).
文摘Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.
基金supported by the Fundamental Research Funds for the Central Universities(No.GK201906009)CERNET Innovation Project(No.NGII20190704)Science and Technology Program of Xi’an City(No.2019216914GXRC005CG006-GXYD5.2).
文摘In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.
文摘Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physical isolation of data, the local data of federated learning clients are still at risk of leakage under the attack of malicious individuals. For this reason, combining data protection techniques (e.g., differential privacy techniques) with federated learning is a sure way to further improve the data security of federated learning models. In this survey, we review recent advances in the research of differentially-private federated learning models. First, we introduce the workflow of federated learning and the theoretical basis of differential privacy. Then, we review three differentially-private federated learning paradigms: central differential privacy, local differential privacy, and distributed differential privacy. After this, we review the algorithmic optimization and communication cost optimization of federated learning models with differential privacy. Finally, we review the applications of federated learning models with differential privacy in various domains. By systematically summarizing the existing research, we propose future research opportunities.
基金supported by National Natural Science Foundation of China(No.61871037)supported by Natural Science Foundation of Beijing(No.M21035).
文摘Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.
基金This work was supported by Funding of the Nanjing Institute of Technology(No.KE21-451).
文摘To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing scheme based on C/S communication mode,and constructed a federated learning architecture that uses differential privacy technology to protect training parameters.Clients do not need to share local data,and they only need to upload the trained model parameters to achieve data sharing.In the process of training,a distributed parameter update mechanism is introduced.The server is mainly responsible for issuing training commands and parameters,and aggregating the local model parameters uploaded by the clients.The client mainly uses the stochastic gradient descent algorithm for gradient trimming,updates,and transmits the trained model parameters back to the server after differential processing.To test the performance of the scheme,in the application scenario where many medical institutions jointly train the disease detection system,the model is tested from multiple perspectives by taking medical data as an example.From the testing results,we can know that for this specific test dataset,when the parameters are properly configured,the lowest prediction accuracy rate is 90.261%and the highest accuracy rate is up to 94.352.It shows that the performance of the model is good.The results also show that this scheme realizes data sharing while protecting data privacy,completes accurate prediction of diseases,and has a good effect.
基金Supported by the National Natural Science Foundation of China(No.61105047)the National High Technology Research and Development Program of China(No.2015IM030300)+1 种基金the Science and Technology Committee of Shanghai Support Project(No.14JC1405800)the Project of the Central Universities Fundamental Research of Tongji University
文摘There are growing concerns surrounding the data security of social networks because large amount of user information and sensitive data are collected. Differential privacy is an effective method for privacy protection that can provide rigorous and quantitative protection. Concerning the application of differential privacy in social networks,this paper analyzes current trends of research and provides some background information including privacy protection standards and noise mechanisms.Focusing on the privacy protection of social network data publishing,a graph-publishing model is designed to provide differential privacy in social networks via three steps: Firstly,according to the features of social network where two nodes that possess certain common properties are associated with a higher probability,a raw graph is divided into several disconnected sub-graphs,and correspondingly dense adjacent matrixes and the number of bridges are obtained. Secondly,taking the advantage of quad-trees,dense region exploration of the adjacent matrixes is conducted. Finally,using an exponential mechanism and leaf nodes of quad-trees,an adjacent matrix of the sanitized graph is reconstructed. In addition,a set of experiments is conducted to evaluate its feasibility,availability and strengths using three analysis techniques: degree distribution,shortest path,and clustering coefficients.
文摘The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.
基金This work is supported by the NSFC[Grant Nos.61772281,61703212,61602254]Jiangsu Province Natural Science Foundation[Grant No.BK2160968]the Priority Academic Program Development of Jiangsu Higher Edu-cation Institutions(PAPD)and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(CICAEET).
文摘Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the XGBoost can improve the classification accuracy while protecting privacy information.When using CART regression tree to build a single decision tree,noise is added according to Laplace mechanism.Compared with random forest algorithm,this algorithm can reduce computation cost and prevent overfitting to a certain extent.The experimental results show that the proposed algorithm is more effective than other traditional algorithms while protecting the privacy information in training data.
基金This work was supported by the National Key R&D Program of China under Grant 2023YFB2703802the Hunan Province Innovation and Entrepreneurship Training Program for College Students S202311528073.
文摘Sharing data while protecting privacy in the industrial Internet is a significant challenge.Traditional machine learning methods require a combination of all data for training;however,this approach can be limited by data availability and privacy concerns.Federated learning(FL)has gained considerable attention because it allows for decentralized training on multiple local datasets.However,the training data collected by data providers are often non-independent and identically distributed(non-IID),resulting in poor FL performance.This paper proposes a privacy-preserving approach for sharing non-IID data in the industrial Internet using an FL approach based on blockchain technology.To overcome the problem of non-IID data leading to poor training accuracy,we propose dynamically updating the local model based on the divergence of the global and local models.This approach can significantly improve the accuracy of FL training when there is relatively large dispersion.In addition,we design a dynamic gradient clipping algorithm to alleviate the influence of noise on the model accuracy to reduce potential privacy leakage caused by sharing model parameters.Finally,we evaluate the performance of the proposed scheme using commonly used open-source image datasets.The simulation results demonstrate that our method can significantly enhance the accuracy while protecting privacy and maintaining efficiency,thereby providing a new solution to data-sharing and privacy-protection challenges in the industrial Internet.
基金supported in part by National Natural Science Foundation of China(Nos.62102311,62202377,62272385)in part by Natural Science Basic Research Program of Shaanxi(Nos.2022JQ-600,2022JM-353,2023-JC-QN-0327)+2 种基金in part by Shaanxi Distinguished Youth Project(No.2022JC-47)in part by Scientific Research Program Funded by Shaanxi Provincial Education Department(No.22JK0560)in part by Distinguished Youth Talents of Shaanxi Universities,and in part by Youth Innovation Team of Shaanxi Universities.
文摘With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most elementary learning models of machine learning.Privacy issues surrounding SVM classifier training have attracted increasing attention.In this paper,we investigate Differential Privacy-compliant Federated Machine Learning with Dimensionality Reduction,called FedDPDR-DPML,which greatly improves data utility while providing strong privacy guarantees.Considering in distributed learning scenarios,multiple participants usually hold unbalanced or small amounts of data.Therefore,FedDPDR-DPML enables multiple participants to collaboratively learn a global model based on weighted model averaging and knowledge aggregation and then the server distributes the global model to each participant to improve local data utility.Aiming at high-dimensional data,we adopt differential privacy in both the principal component analysis(PCA)-based dimensionality reduction phase and SVM classifiers training phase,which improves model accuracy while achieving strict differential privacy protection.Besides,we train Differential privacy(DP)-compliant SVM classifiers by adding noise to the objective function itself,thus leading to better data utility.Extensive experiments on three high-dimensional datasets demonstrate that FedDPDR-DPML can achieve high accuracy while ensuring strong privacy protection.