In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
The rapid evolution of artificial intelligence(AI)technologies has significantly propelled the advancement of the Internet of Vehicles(IoV).With AI support,represented by machine learning technology,vehicles gain the ...The rapid evolution of artificial intelligence(AI)technologies has significantly propelled the advancement of the Internet of Vehicles(IoV).With AI support,represented by machine learning technology,vehicles gain the capability to make intelligent decisions.As a distributed learning paradigm,federated learning(FL)has emerged as a preferred solution in IoV.Compared to traditional centralized machine learning,FL reduces communication overhead and improves privacy protection.Despite these benefits,FL still faces some security and privacy concerns,such as poisoning attacks and inference attacks,prompting exploration into blockchain integration to enhance its security posture.This paper introduces a novel blockchain-enabled federated learning(BCFL)scheme with differential privacy(DP)tailored for IoV.In order to meet the performance demanding IoV environment,the proposed methodology integrates a consortium blockchain with Practical Byzantine Fault Tolerance(PBFT)consensus,which offers superior efficiency over the conventional public blockchains.In addition,the proposed approach utilizes the Differentially Private Stochastic Gradient Descent(DP-SGD)algorithm in the local training process of FL for enhanced privacy protection.Experiment results indicate that the integration of blockchain elevates the security level of FL in that the proposed approach effectively safeguards FL against poisoning attacks.On the other hand,the additional overhead associated with blockchain integration is also limited to a moderate level to meet the efficiency criteria of IoV.Furthermore,by incorporating DP,the proposed approach is shown to have the(ε-δ)privacy guarantee while maintaining an acceptable level of model accuracy.This enhancement effectively mitigates the threat of inference attacks on private information.展开更多
As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local da...As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.展开更多
The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactic...The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactics, could make these models inadvertently disclose their training data. Such disclosures could compromise personal identifiable information, posing significant privacy risks. In this paper, we proposed a novel multi-faceted approach called Whispered Tuning to address privacy leaks in large language models (LLMs). We integrated a PII redaction model, differential privacy techniques, and an output filter into the LLM fine-tuning process to enhance confidentiality. Additionally, we introduced novel ideas like the Epsilon Dial for adjustable privacy budgeting for differentiated Training Phases per data handler role. Through empirical validation, including attacks on non-private models, we demonstrated the robustness of our proposed solution SecureNLP in safeguarding privacy without compromising utility. This pioneering methodology significantly fortified LLMs against privacy infringements, enabling responsible adoption across sectors.展开更多
Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physic...Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physical isolation of data, the local data of federated learning clients are still at risk of leakage under the attack of malicious individuals. For this reason, combining data protection techniques (e.g., differential privacy techniques) with federated learning is a sure way to further improve the data security of federated learning models. In this survey, we review recent advances in the research of differentially-private federated learning models. First, we introduce the workflow of federated learning and the theoretical basis of differential privacy. Then, we review three differentially-private federated learning paradigms: central differential privacy, local differential privacy, and distributed differential privacy. After this, we review the algorithmic optimization and communication cost optimization of federated learning models with differential privacy. Finally, we review the applications of federated learning models with differential privacy in various domains. By systematically summarizing the existing research, we propose future research opportunities.展开更多
To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing...To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing scheme based on C/S communication mode,and constructed a federated learning architecture that uses differential privacy technology to protect training parameters.Clients do not need to share local data,and they only need to upload the trained model parameters to achieve data sharing.In the process of training,a distributed parameter update mechanism is introduced.The server is mainly responsible for issuing training commands and parameters,and aggregating the local model parameters uploaded by the clients.The client mainly uses the stochastic gradient descent algorithm for gradient trimming,updates,and transmits the trained model parameters back to the server after differential processing.To test the performance of the scheme,in the application scenario where many medical institutions jointly train the disease detection system,the model is tested from multiple perspectives by taking medical data as an example.From the testing results,we can know that for this specific test dataset,when the parameters are properly configured,the lowest prediction accuracy rate is 90.261%and the highest accuracy rate is up to 94.352.It shows that the performance of the model is good.The results also show that this scheme realizes data sharing while protecting data privacy,completes accurate prediction of diseases,and has a good effect.展开更多
With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT t...With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.展开更多
Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy...Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.展开更多
By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the ...By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.展开更多
In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protectio...In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.展开更多
Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the u...Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.展开更多
Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of ...Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.展开更多
In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajecto...In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.展开更多
Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the s...Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.展开更多
There are growing concerns surrounding the data security of social networks because large amount of user information and sensitive data are collected. Differential privacy is an effective method for privacy protection...There are growing concerns surrounding the data security of social networks because large amount of user information and sensitive data are collected. Differential privacy is an effective method for privacy protection that can provide rigorous and quantitative protection. Concerning the application of differential privacy in social networks,this paper analyzes current trends of research and provides some background information including privacy protection standards and noise mechanisms.Focusing on the privacy protection of social network data publishing,a graph-publishing model is designed to provide differential privacy in social networks via three steps: Firstly,according to the features of social network where two nodes that possess certain common properties are associated with a higher probability,a raw graph is divided into several disconnected sub-graphs,and correspondingly dense adjacent matrixes and the number of bridges are obtained. Secondly,taking the advantage of quad-trees,dense region exploration of the adjacent matrixes is conducted. Finally,using an exponential mechanism and leaf nodes of quad-trees,an adjacent matrix of the sanitized graph is reconstructed. In addition,a set of experiments is conducted to evaluate its feasibility,availability and strengths using three analysis techniques: degree distribution,shortest path,and clustering coefficients.展开更多
The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Neverth...The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.展开更多
Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the X...Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the XGBoost can improve the classification accuracy while protecting privacy information.When using CART regression tree to build a single decision tree,noise is added according to Laplace mechanism.Compared with random forest algorithm,this algorithm can reduce computation cost and prevent overfitting to a certain extent.The experimental results show that the proposed algorithm is more effective than other traditional algorithms while protecting the privacy information in training data.展开更多
Many areas are now experiencing data streams that contain privacy-sensitive information.Although the sharing and release of these data are of great commercial value,if these data are released directly,the private user...Many areas are now experiencing data streams that contain privacy-sensitive information.Although the sharing and release of these data are of great commercial value,if these data are released directly,the private user information in the data will be disclosed.Therefore,how to continuously generate publishable histograms(meeting privacy protection requirements)based on sliding data stream windows has become a critical issue,especially when sending data to an untrusted third party.Existing histogram publication methods are unsatisfactory in terms of time and storage costs,because they must cache all elements in the current sliding window(sW).Our work addresses this drawback by designing an efficient online histogram publication(EOHP)method for local differential privacy data streams.Specifically,in the EOHP method,the data collector first crafts a histogram of the current SW using an approximate counting method.Second,the data collector reduces the privacy budget by using the optimized budget absorption mechanism and adds appropriate noise to the approximate histogram,making it possible to publish the histogram while retaining satisfactory data utility.Extensive experimental results on two different real datasets show that the EOHP algorithm significantly reduces the time and storage costs and improves data utility compared to other existing algorithms.展开更多
Under the general trend of the rapid development of smart grids,data security and privacy are facing serious challenges;protecting the privacy data of single users under the premise of obtaining user-aggregated data h...Under the general trend of the rapid development of smart grids,data security and privacy are facing serious challenges;protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention.In this study,we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters.First,we use an improved homomorphic encryption method to realize the encryption aggregation of users’data.Second,we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another.Finally,the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism.Even if an attacker has enough background knowledge,the security of the electricity information of one another can be ensured.展开更多
Differential privacy is an essential approach for privacy preservation in data queries.However,users face a significant challenge in selecting an appropriate privacy scheme,as they struggle to balance the utility of q...Differential privacy is an essential approach for privacy preservation in data queries.However,users face a significant challenge in selecting an appropriate privacy scheme,as they struggle to balance the utility of query results with the preservation of diverse individual privacy.Customizing a privacy scheme becomes even more complex in dealing with queries that involve multiple data attributes.When adversaries attempt to breach privacy firewalls by conducting multiple regular data queries with various attribute values,data owners must arduously discern unpredictable disclosure risks and construct suitable privacy schemes.In this paper,we propose a visual analysis approach for formulating privacy schemes of differential privacy.Our approach supports the identification and simulation of potential privacy attacks in querying statistical results of multi-dimensional databases.We also developed a prototype system,called DPKnob,which integrates multiple coordinated views.DPKnob not only allows users to interactively assess and explore privacy exposure risks by browsing high-risk attacks,but also facilitates an iterative process for formulating and optimizing privacy schemes based on differential privacy.This iterative process allows users to compare different schemes,refine their expectations of privacy and utility,and ultimately establish a well-balanced privacy scheme.The effectiveness of this study is verified by a user study and two case studies with real-world datasets.展开更多
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.
基金supported in part by the Natural Science Foundation of Henan Province(Grant No.202300410510)the Consulting Research Project of Chinese Academy of Engineering(Grant No.2020YNZH7)+3 种基金the Key Scientific Research Project of Colleges and Universities in Henan Province(Grant Nos.23A520043 and 23B520010)the International Science and Technology Cooperation Project of Henan Province(Grant No.232102521004)the National Key Research and Development Program of China(Grant No.2020YFB1005404)the Henan Provincial Science and Technology Research Project(Grant No.212102210100).
文摘The rapid evolution of artificial intelligence(AI)technologies has significantly propelled the advancement of the Internet of Vehicles(IoV).With AI support,represented by machine learning technology,vehicles gain the capability to make intelligent decisions.As a distributed learning paradigm,federated learning(FL)has emerged as a preferred solution in IoV.Compared to traditional centralized machine learning,FL reduces communication overhead and improves privacy protection.Despite these benefits,FL still faces some security and privacy concerns,such as poisoning attacks and inference attacks,prompting exploration into blockchain integration to enhance its security posture.This paper introduces a novel blockchain-enabled federated learning(BCFL)scheme with differential privacy(DP)tailored for IoV.In order to meet the performance demanding IoV environment,the proposed methodology integrates a consortium blockchain with Practical Byzantine Fault Tolerance(PBFT)consensus,which offers superior efficiency over the conventional public blockchains.In addition,the proposed approach utilizes the Differentially Private Stochastic Gradient Descent(DP-SGD)algorithm in the local training process of FL for enhanced privacy protection.Experiment results indicate that the integration of blockchain elevates the security level of FL in that the proposed approach effectively safeguards FL against poisoning attacks.On the other hand,the additional overhead associated with blockchain integration is also limited to a moderate level to meet the efficiency criteria of IoV.Furthermore,by incorporating DP,the proposed approach is shown to have the(ε-δ)privacy guarantee while maintaining an acceptable level of model accuracy.This enhancement effectively mitigates the threat of inference attacks on private information.
基金the Sichuan Provincial Science and Technology Department Project under Grant 2019YFN0104the Yibin Science and Technology Plan Project under Grant 2021GY008the Sichuan University of Science and Engineering Postgraduate Innovation Fund Project under Grant Y2022154.
文摘As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.
文摘The proliferation of Large Language Models (LLMs) across various sectors underscored the urgency of addressing potential privacy breaches. Vulnerabilities, such as prompt injection attacks and other adversarial tactics, could make these models inadvertently disclose their training data. Such disclosures could compromise personal identifiable information, posing significant privacy risks. In this paper, we proposed a novel multi-faceted approach called Whispered Tuning to address privacy leaks in large language models (LLMs). We integrated a PII redaction model, differential privacy techniques, and an output filter into the LLM fine-tuning process to enhance confidentiality. Additionally, we introduced novel ideas like the Epsilon Dial for adjustable privacy budgeting for differentiated Training Phases per data handler role. Through empirical validation, including attacks on non-private models, we demonstrated the robustness of our proposed solution SecureNLP in safeguarding privacy without compromising utility. This pioneering methodology significantly fortified LLMs against privacy infringements, enabling responsible adoption across sectors.
文摘Federated learning is a distributed machine learning technique that trains a global model by exchanging model parameters or intermediate results among multiple data sources. Although federated learning achieves physical isolation of data, the local data of federated learning clients are still at risk of leakage under the attack of malicious individuals. For this reason, combining data protection techniques (e.g., differential privacy techniques) with federated learning is a sure way to further improve the data security of federated learning models. In this survey, we review recent advances in the research of differentially-private federated learning models. First, we introduce the workflow of federated learning and the theoretical basis of differential privacy. Then, we review three differentially-private federated learning paradigms: central differential privacy, local differential privacy, and distributed differential privacy. After this, we review the algorithmic optimization and communication cost optimization of federated learning models with differential privacy. Finally, we review the applications of federated learning models with differential privacy in various domains. By systematically summarizing the existing research, we propose future research opportunities.
基金This work was supported by Funding of the Nanjing Institute of Technology(No.KE21-451).
文摘To realize data sharing,and to fully use the data value,breaking the data island between institutions to realize data collaboration has become a new sharing mode.This paper proposed a distributed data security sharing scheme based on C/S communication mode,and constructed a federated learning architecture that uses differential privacy technology to protect training parameters.Clients do not need to share local data,and they only need to upload the trained model parameters to achieve data sharing.In the process of training,a distributed parameter update mechanism is introduced.The server is mainly responsible for issuing training commands and parameters,and aggregating the local model parameters uploaded by the clients.The client mainly uses the stochastic gradient descent algorithm for gradient trimming,updates,and transmits the trained model parameters back to the server after differential processing.To test the performance of the scheme,in the application scenario where many medical institutions jointly train the disease detection system,the model is tested from multiple perspectives by taking medical data as an example.From the testing results,we can know that for this specific test dataset,when the parameters are properly configured,the lowest prediction accuracy rate is 90.261%and the highest accuracy rate is up to 94.352.It shows that the performance of the model is good.The results also show that this scheme realizes data sharing while protecting data privacy,completes accurate prediction of diseases,and has a good effect.
文摘With the development of Internet of Things(IoT),the delay caused by network transmission has led to low data processing efficiency.At the same time,the limited computing power and available energy consumption of IoT terminal devices are also the important bottlenecks that would restrict the application of blockchain,but edge computing could solve this problem.The emergence of edge computing can effectively reduce the delay of data transmission and improve data processing capacity.However,user data in edge computing is usually stored and processed in some honest-but-curious authorized entities,which leads to the leakage of users’privacy information.In order to solve these problems,this paper proposes a location data collection method that satisfies the local differential privacy to protect users’privacy.In this paper,a Voronoi diagram constructed by the Delaunay method is used to divide the road network space and determine the Voronoi grid region where the edge nodes are located.A random disturbance mechanism that satisfies the local differential privacy is utilized to disturb the original location data in each Voronoi grid.In addition,the effectiveness of the proposed privacy-preserving mechanism is verified through comparison experiments.Compared with the existing privacy-preserving methods,the proposed privacy-preserving mechanism can not only better meet users’privacy needs,but also have higher data availability.
基金supported by the National Key R&D Program of China under Grant 2020YFB1806904by the National Natural Science Foundation of China under Grants 61872416,62171189,62172438 and 62071192+1 种基金by the Fundamental Research Funds for the Central Universities of China under Grant 2019kfyXJJS017,31732111303,31512111310by the special fund for Wuhan Yellow Crane Talents(Excellent Young Scholar).
文摘Federated Learning(FL)is a new computing paradigm in privacy-preserving Machine Learning(ML),where the ML model is trained in a decentralized manner by the clients,preventing the server from directly accessing privacy-sensitive data from the clients.Unfortunately,recent advances have shown potential risks for user-level privacy breaches under the cross-silo FL framework.In this paper,we propose addressing the issue by using a three-plane framework to secure the cross-silo FL,taking advantage of the Local Differential Privacy(LDP)mechanism.The key insight here is that LDP can provide strong data privacy protection while still retaining user data statistics to preserve its high utility.Experimental results on three real-world datasets demonstrate the effectiveness of our framework.
基金supported in part by the National Natural Science Foundation of China under Grant No.61972371Youth Innovation Promotion Association of Chinese Academy of Sciences(CAS)under Grant No.Y202093.
文摘By integrating the traditional power grid with information and communication technology, smart grid achieves dependable, efficient, and flexible grid data processing. The smart meters deployed on the user side of the smart grid collect the users' power usage data on a regular basis and upload it to the control center to complete the smart grid data acquisition. The control center can evaluate the supply and demand of the power grid through aggregated data from users and then dynamically adjust the power supply and price, etc. However, since the grid data collected from users may disclose the user's electricity usage habits and daily activities, privacy concern has become a critical issue in smart grid data aggregation. Most of the existing privacy-preserving data collection schemes for smart grid adopt homomorphic encryption or randomization techniques which are either impractical because of the high computation overhead or unrealistic for requiring a trusted third party.
基金This work is supported by Major Scientific and Technological Special Project of Guizhou Province(20183001)Research on the education mode for complicate skill students in new media with cross specialty integration(22150117092)+2 种基金Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ014)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ019)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ022).
文摘In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.
基金supported in part by National Natural Science Foundation of China(No.61672106)in part by Natural Science Foundation of Beijing,China(L192023)in part by the project of promoting the Classified Development of Beijing Information Science and Technology University(No.5112211038,5112211039)。
文摘Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.
基金This paper is supported by the Inner Mongolia Natural Science Foundation(Grant Number:2018MS06026,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/)the Science and Technology Program of Inner Mongolia Autonomous Region(Grant Number:2019GG116,Sponsored Authors:Liu,H.and Ma,X.,Sponsors’Websites:http://kjt.nmg.gov.cn/).
文摘Frequent itemset mining is an essential problem in data mining and plays a key role in many data mining applications.However,users’personal privacy will be leaked in the mining process.In recent years,application of local differential privacy protection models to mine frequent itemsets is a relatively reliable and secure protection method.Local differential privacy means that users first perturb the original data and then send these data to the aggregator,preventing the aggregator from revealing the user’s private information.We propose a novel framework that implements frequent itemset mining under local differential privacy and is applicable to user’s multi-attribute.The main technique has bitmap encoding for converting the user’s original data into a binary string.It also includes how to choose the best perturbation algorithm for varying user attributes,and uses the frequent pattern tree(FP-tree)algorithm to mine frequent itemsets.Finally,we incorporate the threshold random response(TRR)algorithm in the framework and compare it with the existing algorithms,and demonstrate that the TRR algorithm has higher accuracy for mining frequent itemsets.
基金supported by the Fundamental Research Funds for the Central Universities(No.GK201906009)CERNET Innovation Project(No.NGII20190704)Science and Technology Program of Xi’an City(No.2019216914GXRC005CG006-GXYD5.2).
文摘In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.
基金supported by National Natural Science Foundation of China(No.61871037)supported by Natural Science Foundation of Beijing(No.M21035).
文摘Mobile edge computing(MEC)is an emerging technolohgy that extends cloud computing to the edge of a network.MEC has been applied to a variety of services.Specially,MEC can help to reduce network delay and improve the service quality of recommendation systems.In a MEC-based recommendation system,users’rating data are collected and analyzed by the edge servers.If the servers behave dishonestly or break down,users’privacy may be disclosed.To solve this issue,we design a recommendation framework that applies local differential privacy(LDP)to collaborative filtering.In the proposed framework,users’rating data are perturbed to satisfy LDP and then released to the edge servers.The edge servers perform partial computing task by using the perturbed data.The cloud computing center computes the similarity between items by using the computing results generated by edge servers.We propose a data perturbation method to protect user’s original rating values,where the Harmony mechanism is modified so as to preserve the accuracy of similarity computation.And to enhance the protection of privacy,we propose two methods to protect both users’rating values and rating behaviors.Experimental results on real-world data demonstrate that the proposed methods perform better than existing differentially private recommendation methods.
基金Supported by the National Natural Science Foundation of China(No.61105047)the National High Technology Research and Development Program of China(No.2015IM030300)+1 种基金the Science and Technology Committee of Shanghai Support Project(No.14JC1405800)the Project of the Central Universities Fundamental Research of Tongji University
文摘There are growing concerns surrounding the data security of social networks because large amount of user information and sensitive data are collected. Differential privacy is an effective method for privacy protection that can provide rigorous and quantitative protection. Concerning the application of differential privacy in social networks,this paper analyzes current trends of research and provides some background information including privacy protection standards and noise mechanisms.Focusing on the privacy protection of social network data publishing,a graph-publishing model is designed to provide differential privacy in social networks via three steps: Firstly,according to the features of social network where two nodes that possess certain common properties are associated with a higher probability,a raw graph is divided into several disconnected sub-graphs,and correspondingly dense adjacent matrixes and the number of bridges are obtained. Secondly,taking the advantage of quad-trees,dense region exploration of the adjacent matrixes is conducted. Finally,using an exponential mechanism and leaf nodes of quad-trees,an adjacent matrix of the sanitized graph is reconstructed. In addition,a set of experiments is conducted to evaluate its feasibility,availability and strengths using three analysis techniques: degree distribution,shortest path,and clustering coefficients.
文摘The structure of key-value data is a typical data structure generated by mobile devices.The collection and analysis of the data from mobile devices are critical for service providers to improve service quality.Nevertheless,collecting raw data,which may contain various per⁃sonal information,would lead to serious personal privacy leaks.Local differential privacy(LDP)has been proposed to protect privacy on the device side so that the server cannot obtain the raw data.However,existing mechanisms assume that all keys are equally sensitive,which can⁃not produce high-precision statistical results.A utility-improved data collection framework with LDP for key-value formed mobile data is pro⁃posed to solve this issue.More specifically,we divide the key-value data into sensitive and non-sensitive parts and only provide an LDPequivalent privacy guarantee for sensitive keys and all values.We instantiate our framework by using a utility-improved key value-unary en⁃coding(UKV-UE)mechanism based on unary encoding,with which our framework can work effectively for a large key domain.We then vali⁃date our mechanism which provides better utility and is suitable for mobile devices by evaluating it in two real datasets.Finally,some pos⁃sible future research directions are envisioned.
基金This work is supported by the NSFC[Grant Nos.61772281,61703212,61602254]Jiangsu Province Natural Science Foundation[Grant No.BK2160968]the Priority Academic Program Development of Jiangsu Higher Edu-cation Institutions(PAPD)and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(CICAEET).
文摘Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the XGBoost can improve the classification accuracy while protecting privacy information.When using CART regression tree to build a single decision tree,noise is added according to Laplace mechanism.Compared with random forest algorithm,this algorithm can reduce computation cost and prevent overfitting to a certain extent.The experimental results show that the proposed algorithm is more effective than other traditional algorithms while protecting the privacy information in training data.
基金supported by the Anhui Provincial Natural Science Foundation,China(Nos.2108085MF218 and 2022AH040052)the University Synergy Innovation Program of Anhui Province,China(No.GXXT-2023-021)+1 种基金the Key Program of the Natural Science Foundation of the Educational Commission of Anhui Province of China(No.2022AH050319)the National Natural Science Foundation of China(Nos.62172003 and 61402008)。
文摘Many areas are now experiencing data streams that contain privacy-sensitive information.Although the sharing and release of these data are of great commercial value,if these data are released directly,the private user information in the data will be disclosed.Therefore,how to continuously generate publishable histograms(meeting privacy protection requirements)based on sliding data stream windows has become a critical issue,especially when sending data to an untrusted third party.Existing histogram publication methods are unsatisfactory in terms of time and storage costs,because they must cache all elements in the current sliding window(sW).Our work addresses this drawback by designing an efficient online histogram publication(EOHP)method for local differential privacy data streams.Specifically,in the EOHP method,the data collector first crafts a histogram of the current SW using an approximate counting method.Second,the data collector reduces the privacy budget by using the optimized budget absorption mechanism and adds appropriate noise to the approximate histogram,making it possible to publish the histogram while retaining satisfactory data utility.Extensive experimental results on two different real datasets show that the EOHP algorithm significantly reduces the time and storage costs and improves data utility compared to other existing algorithms.
基金This work was supported by the National Natural Science Foundation of China(No.51677059)the Fujian Provincial University Engineering Research Center Open Fund(No.KF-D21009).
文摘Under the general trend of the rapid development of smart grids,data security and privacy are facing serious challenges;protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention.In this study,we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters.First,we use an improved homomorphic encryption method to realize the encryption aggregation of users’data.Second,we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another.Finally,the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism.Even if an attacker has enough background knowledge,the security of the electricity information of one another can be ensured.
基金supported by the NSFC,China(62202244,U22B2034)and"the Fundamental Research Funds for the Central Universities,China,"Nankai University.
文摘Differential privacy is an essential approach for privacy preservation in data queries.However,users face a significant challenge in selecting an appropriate privacy scheme,as they struggle to balance the utility of query results with the preservation of diverse individual privacy.Customizing a privacy scheme becomes even more complex in dealing with queries that involve multiple data attributes.When adversaries attempt to breach privacy firewalls by conducting multiple regular data queries with various attribute values,data owners must arduously discern unpredictable disclosure risks and construct suitable privacy schemes.In this paper,we propose a visual analysis approach for formulating privacy schemes of differential privacy.Our approach supports the identification and simulation of potential privacy attacks in querying statistical results of multi-dimensional databases.We also developed a prototype system,called DPKnob,which integrates multiple coordinated views.DPKnob not only allows users to interactively assess and explore privacy exposure risks by browsing high-risk attacks,but also facilitates an iterative process for formulating and optimizing privacy schemes based on differential privacy.This iterative process allows users to compare different schemes,refine their expectations of privacy and utility,and ultimately establish a well-balanced privacy scheme.The effectiveness of this study is verified by a user study and two case studies with real-world datasets.