As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local da...As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.展开更多
In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactic...The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactics, could make these models inadvertently disclose their training data. Such disclosures could compromise personal identifiable information, posing significant privacy risks. In this paper, we proposed a novel multi-faceted approach called Whispered Tuning to address privacy leaks in large language models (LLMs). We integrated a PII redaction model, differential privacy techniques, and an output filter into the LLM fine-tuning process to enhance confidentiality. Additionally, we introduced novel ideas like the Epsilon Dial for adjustable privacy budgeting for differentiated Training Phases per data handler role. Through empirical validation, including attacks on non-private models, we demonstrated the robustness of our proposed solution SecureNLP in safeguarding privacy without compromising utility. This pioneering methodology significantly fortified LLMs against privacy infringements, enabling responsible adoption across sectors.展开更多
With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most...With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most elementary learning models of machine learning.Privacy issues surrounding SVM classifier training have attracted increasing attention.In this paper,we investigate Differential Privacy-compliant Federated Machine Learning with Dimensionality Reduction,called FedDPDR-DPML,which greatly improves data utility while providing strong privacy guarantees.Considering in distributed learning scenarios,multiple participants usually hold unbalanced or small amounts of data.Therefore,FedDPDR-DPML enables multiple participants to collaboratively learn a global model based on weighted model averaging and knowledge aggregation and then the server distributes the global model to each participant to improve local data utility.Aiming at high-dimensional data,we adopt differential privacy in both the principal component analysis(PCA)-based dimensionality reduction phase and SVM classifiers training phase,which improves model accuracy while achieving strict differential privacy protection.Besides,we train Differential privacy(DP)-compliant SVM classifiers by adding noise to the objective function itself,thus leading to better data utility.Extensive experiments on three high-dimensional datasets demonstrate that FedDPDR-DPML can achieve high accuracy while ensuring strong privacy protection.展开更多
In the realm of large-scale machine learning,it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance.Additionally,since the collected data...In the realm of large-scale machine learning,it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance.Additionally,since the collected data may contain some sensitive information,it is also of great significance to study privacy-preserving machine learning algorithms.This paper focuses on the performance of the differentially private stochastic gradient descent(SGD)algorithm based on random features.To begin,the algorithm maps the original data into a lowdimensional space,thereby avoiding the traditional kernel method for large-scale data storage requirement.Subsequently,the algorithm iteratively optimizes parameters using the stochastic gradient descent approach.Lastly,the output perturbation mechanism is employed to introduce random noise,ensuring algorithmic privacy.We prove that the proposed algorithm satisfies the differential privacy while achieving fast convergence rates under some mild conditions.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing...To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing scheme based on C/S communication mode,and constructed a federated learning architecture that uses differential privacy technology to protect training parameters.Clients do not need to share local data,and they only need to upload the trained model parameters to achieve data sharing.In the process of training,a distributed parameter update mechanism is introduced.The server is mainly responsible for issuing training commands and parameters,and aggregating the local model parameters uploaded by the clients.The client mainly uses the stochastic gradient descent algorithm for gradient trimming,updates,and transmits the trained model parameters back to the server after differential processing.To test the performance of the scheme,in the application scenario where many medical institutions jointly train the disease detection system,the model is tested from multiple perspectives by taking medical data as an example.From the testing results,we can know that for this specific test dataset,when the parameters are properly configured,the lowest prediction accuracy rate is 90.261%and the highest accuracy rate is up to 94.352.It shows that the performance of the model is good.The results also show that this scheme realizes data sharing while protecting data privacy,completes accurate prediction of diseases,and has a good effect.展开更多
Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physic...Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physical isolation of data, the local data of federated learning clients are still at risk of leakage under the attack of malicious individuals. For this reason, combining data protection techniques (e.g., differential privacy techniques) with federated learning is a sure way to further improve the data security of federated learning models. In this survey, we review recent advances in the research of differentially-private federated learning models. First, we introduce the workflow of federated learning and the theoretical basis of differential privacy. Then, we review three differentially-private federated learning paradigms: central differential privacy, local differential privacy, and distributed differential privacy. After this, we review the algorithmic optimization and communication cost optimization of federated learning models with differential privacy. Finally, we review the applications of federated learning models with differential privacy in various domains. By systematically summarizing the existing research, we propose future research opportunities.展开更多
With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT t...With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.展开更多
In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protectio...In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.展开更多
By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the ...By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.展开更多
Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy inform...Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy information. The direct release of social network data will cause the disclosure of privacy information. Aiming at the dynamic characteristics of social network data release, a new dynamic social network data publishing method based on differential privacy was proposed. This method was consistent with differential privacy. It is named DDPA (Dynamic Differential Privacy Algorithm). DDPA algorithm is an improvement of privacy protection algorithm in static social network data publishing. DDPA adds noise which follows Laplace to network edge weights. DDPA identifies the edge weight information that changes as the number of iterations increases, adding the privacy protection budget. Through experiments on real data sets, the results show that the DDPA algorithm satisfies the user’s privacy requirement in social network. DDPA reduces the execution time brought by iterations and reduces the information loss rate of graph structure.展开更多
Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy...Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.展开更多
Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the u...Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.展开更多
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ...Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.展开更多
In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajecto...In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.展开更多
Due to the overwhelming characteristics of the Internet of Things(IoT)and its adoption in approximately every aspect of our lives,the concept of individual devices’privacy has gained prominent attention from both cus...Due to the overwhelming characteristics of the Internet of Things(IoT)and its adoption in approximately every aspect of our lives,the concept of individual devices’privacy has gained prominent attention from both customers,i.e.,people,and industries as wearable devices collect sensitive information about patients(both admitted and outdoor)in smart healthcare infrastructures.In addition to privacy,outliers or noise are among the crucial issues,which are directly correlated with IoT infrastructures,as most member devices are resource-limited and could generate or transmit false data that is required to be refined before processing,i.e.,transmitting.Therefore,the development of privacy-preserving information fusion techniques is highly encouraged,especially those designed for smart IoT-enabled domains.In this paper,we are going to present an effective hybrid approach that can refine raw data values captured by the respectivemember device before transmission while preserving its privacy through the utilization of the differential privacy technique in IoT infrastructures.Sliding window,i.e.,δi based dynamic programming methodology,is implemented at the device level to ensure precise and accurate detection of outliers or noisy data,and refine it prior to activation of the respective transmission activity.Additionally,an appropriate privacy budget has been selected,which is enough to ensure the privacy of every individualmodule,i.e.,a wearable device such as a smartwatch attached to the patient’s body.In contrast,the end module,i.e.,the server in this case,can extract important information with approximately the maximum level of accuracy.Moreover,refined data has been processed by adding an appropriate nose through the Laplace mechanism to make it useless or meaningless for the adversary modules in the IoT.The proposed hybrid approach is trusted from both the device’s privacy and the integrity of the transmitted information perspectives.Simulation and analytical results have proved that the proposed privacy-preserving information fusion technique for wearable devices is an ideal solution for resource-constrained infrastructures such as IoT and the Internet ofMedical Things,where both device privacy and information integrity are important.Finally,the proposed hybrid approach is proven against well-known intruder attacks,especially those related to the privacy of the respective device in IoT infrastructures.展开更多
Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the s...Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.展开更多
基金the Sichuan Provincial Science and Technology Department Project under Grant 2019YFN0104the Yibin Science and Technology Plan Project under Grant 2021GY008the Sichuan University of Science and Engineering Postgraduate Innovation Fund Project under Grant Y2022154.
文摘As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
文摘The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactics, could make these models inadvertently disclose their training data. Such disclosures could compromise personal identifiable information, posing significant privacy risks. In this paper, we proposed a novel multi-faceted approach called Whispered Tuning to address privacy leaks in large language models (LLMs). We integrated a PII redaction model, differential privacy techniques, and an output filter into the LLM fine-tuning process to enhance confidentiality. Additionally, we introduced novel ideas like the Epsilon Dial for adjustable privacy budgeting for differentiated Training Phases per data handler role. Through empirical validation, including attacks on non-private models, we demonstrated the robustness of our proposed solution SecureNLP in safeguarding privacy without compromising utility. This pioneering methodology significantly fortified LLMs against privacy infringements, enabling responsible adoption across sectors.
基金supported in part by National Natural Science Foundation of China(Nos.62102311,62202377,62272385)in part by Natural Science Basic Research Program of Shaanxi(Nos.2022JQ-600,2022JM-353,2023-JC-QN-0327)+2 种基金in part by Shaanxi Distinguished Youth Project(No.2022JC-47)in part by Scientific Research Program Funded by Shaanxi Provincial Education Department(No.22JK0560)in part by Distinguished Youth Talents of Shaanxi Universities,and in part by Youth Innovation Team of Shaanxi Universities.
文摘With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most elementary learning models of machine learning.Privacy issues surrounding SVM classifier training have attracted increasing attention.In this paper,we investigate Differential Privacy-compliant Federated Machine Learning with Dimensionality Reduction,called FedDPDR-DPML,which greatly improves data utility while providing strong privacy guarantees.Considering in distributed learning scenarios,multiple participants usually hold unbalanced or small amounts of data.Therefore,FedDPDR-DPML enables multiple participants to collaboratively learn a global model based on weighted model averaging and knowledge aggregation and then the server distributes the global model to each participant to improve local data utility.Aiming at high-dimensional data,we adopt differential privacy in both the principal component analysis(PCA)-based dimensionality reduction phase and SVM classifiers training phase,which improves model accuracy while achieving strict differential privacy protection.Besides,we train Differential privacy(DP)-compliant SVM classifiers by adding noise to the objective function itself,thus leading to better data utility.Extensive experiments on three high-dimensional datasets demonstrate that FedDPDR-DPML can achieve high accuracy while ensuring strong privacy protection.
基金supported by Zhejiang Provincial Natural Science Foundation of China(LR20A010001)National Natural Science Foundation of China(12271473 and U21A20426)。
文摘In the realm of large-scale machine learning,it is crucial to explore methods for reducing computational complexity and memory demands while maintaining generalization performance.Additionally,since the collected data may contain some sensitive information,it is also of great significance to study privacy-preserving machine learning algorithms.This paper focuses on the performance of the differentially private stochastic gradient descent(SGD)algorithm based on random features.To begin,the algorithm maps the original data into a lowdimensional space,thereby avoiding the traditional kernel method for large-scale data storage requirement.Subsequently,the algorithm iteratively optimizes parameters using the stochastic gradient descent approach.Lastly,the output perturbation mechanism is employed to introduce random noise,ensuring algorithmic privacy.We prove that the proposed algorithm satisfies the differential privacy while achieving fast convergence rates under some mild conditions.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
基金This work was supported by Funding of the Nanjing Institute of Technology(No.KE21-451).
文摘To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing scheme based on C/S communication mode,and constructed a federated learning architecture that uses differential privacy technology to protect training parameters.Clients do not need to share local data,and they only need to upload the trained model parameters to achieve data sharing.In the process of training,a distributed parameter update mechanism is introduced.The server is mainly responsible for issuing training commands and parameters,and aggregating the local model parameters uploaded by the clients.The client mainly uses the stochastic gradient descent algorithm for gradient trimming,updates,and transmits the trained model parameters back to the server after differential processing.To test the performance of the scheme,in the application scenario where many medical institutions jointly train the disease detection system,the model is tested from multiple perspectives by taking medical data as an example.From the testing results,we can know that for this specific test dataset,when the parameters are properly configured,the lowest prediction accuracy rate is 90.261%and the highest accuracy rate is up to 94.352.It shows that the performance of the model is good.The results also show that this scheme realizes data sharing while protecting data privacy,completes accurate prediction of diseases,and has a good effect.
文摘Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physical isolation of data, the local data of federated learning clients are still at risk of leakage under the attack of malicious individuals. For this reason, combining data protection techniques (e.g., differential privacy techniques) with federated learning is a sure way to further improve the data security of federated learning models. In this survey, we review recent advances in the research of differentially-private federated learning models. First, we introduce the workflow of federated learning and the theoretical basis of differential privacy. Then, we review three differentially-private federated learning paradigms: central differential privacy, local differential privacy, and distributed differential privacy. After this, we review the algorithmic optimization and communication cost optimization of federated learning models with differential privacy. Finally, we review the applications of federated learning models with differential privacy in various domains. By systematically summarizing the existing research, we propose future research opportunities.
文摘With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.
基金This work is supported by Major Scientific and Technological Special Project of Guizhou Province(20183001)Research on the education mode for complicate skill students in new media with cross specialty integration(22150117092)+2 种基金Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ014)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ019)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ022).
文摘In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.
基金supported in part by the National Natural Science Foundation of China under Grant No.61972371Youth Innovation Promotion Association of Chinese Academy of Sciences(CAS)under Grant No.Y202093.
文摘By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.
文摘Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy information. The direct release of social network data will cause the disclosure of privacy information. Aiming at the dynamic characteristics of social network data release, a new dynamic social network data publishing method based on differential privacy was proposed. This method was consistent with differential privacy. It is named DDPA (Dynamic Differential Privacy Algorithm). DDPA algorithm is an improvement of privacy protection algorithm in static social network data publishing. DDPA adds noise which follows Laplace to network edge weights. DDPA identifies the edge weight information that changes as the number of iterations increases, adding the privacy protection budget. Through experiments on real data sets, the results show that the DDPA algorithm satisfies the user’s privacy requirement in social network. DDPA reduces the execution time brought by iterations and reduces the information loss rate of graph structure.
基金supported by the National Key R&D Program of China under Grant 2020YFB1806904by the National Natural Science Foundation of China under Grants 61872416,62171189,62172438 and 62071192+1 种基金by the Fundamental Research Funds for the Central Universities of China under Grant 2019kfyXJJS017,31732111303,31512111310by the special fund for Wuhan Yellow Crane Talents(Excellent Young Scholar).
文摘Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.
基金supported in part by National Natural Science Foundation of China(No.61672106)in part by Natural Science Foundation of Beijing,China(L192023)in part by the project of promoting the Classified Development of Beijing Information Science and Technology University(No.5112211038,5112211039)。
文摘Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.
基金This paper is supported by the Inner Mongolia Natural Science Foundation(Grant Number:2018MS06026,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/)the Science and Technology Program of Inner Mongolia Autonomous Region(Grant Number:2019GG116,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/).
文摘Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.
基金supported by the Fundamental Research Funds for the Central Universities(No.GK201906009)CERNET Innovation Project(No.NGII20190704)Science and Technology Program of Xi’an City(No.2019216914GXRC005CG006-GXYD5.2).
文摘In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.
基金Ministry of Higher Education of Malaysia under theResearch GrantLRGS/1/2019/UKM-UKM/5/2 and Princess Nourah bint Abdulrahman University for financing this researcher through Supporting Project Number(PNURSP2024R235),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Due to the overwhelming characteristics of the Internet of Things(IoT)and its adoption in approximately every aspect of our lives,the concept of individual devices’privacy has gained prominent attention from both customers,i.e.,people,and industries as wearable devices collect sensitive information about patients(both admitted and outdoor)in smart healthcare infrastructures.In addition to privacy,outliers or noise are among the crucial issues,which are directly correlated with IoT infrastructures,as most member devices are resource-limited and could generate or transmit false data that is required to be refined before processing,i.e.,transmitting.Therefore,the development of privacy-preserving information fusion techniques is highly encouraged,especially those designed for smart IoT-enabled domains.In this paper,we are going to present an effective hybrid approach that can refine raw data values captured by the respectivemember device before transmission while preserving its privacy through the utilization of the differential privacy technique in IoT infrastructures.Sliding window,i.e.,δi based dynamic programming methodology,is implemented at the device level to ensure precise and accurate detection of outliers or noisy data,and refine it prior to activation of the respective transmission activity.Additionally,an appropriate privacy budget has been selected,which is enough to ensure the privacy of every individualmodule,i.e.,a wearable device such as a smartwatch attached to the patient’s body.In contrast,the end module,i.e.,the server in this case,can extract important information with approximately the maximum level of accuracy.Moreover,refined data has been processed by adding an appropriate nose through the Laplace mechanism to make it useless or meaningless for the adversary modules in the IoT.The proposed hybrid approach is trusted from both the device’s privacy and the integrity of the transmitted information perspectives.Simulation and analytical results have proved that the proposed privacy-preserving information fusion technique for wearable devices is an ideal solution for resource-constrained infrastructures such as IoT and the Internet ofMedical Things,where both device privacy and information integrity are important.Finally,the proposed hybrid approach is proven against well-known intruder attacks,especially those related to the privacy of the respective device in IoT infrastructures.
基金supported by National Natural Science Foundation of China(No.61871037)supported by Natural Science Foundation of Beijing(No.M21035).
文摘Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.