Developing a privacy-preserving data publishing algorithm that stops individuals from disclosing their identities while not ignoring data utility remains an important goal to achieve.Because finding the trade-off betw...Developing a privacy-preserving data publishing algorithm that stops individuals from disclosing their identities while not ignoring data utility remains an important goal to achieve.Because finding the trade-off between data privacy and data utility is an NP-hard problem and also a current research area.When existing approaches are investigated,one of the most significant difficulties discovered is the presence of outlier data in the datasets.Outlier data has a negative impact on data utility.Furthermore,k-anonymity algorithms,which are commonly used in the literature,do not provide adequate protection against outlier data.In this study,a new data anonymization algorithm is devised and tested for boosting data utility by incorporating an outlier data detection mechanism into the Mondrian algorithm.The connectivity-based outlier factor(COF)algorithm is used to detect outliers.Mondrian is selected because of its capacity to anonymize multidimensional data while meeting the needs of real-world data.COF,on the other hand,is used to discover outliers in high-dimensional datasets with complicated structures.The proposed algorithm generates more equivalence classes than the Mondrian algorithm and provides greater data utility than previous algorithms based on k-anonymization.In addition,it outperforms other algorithms in the discernibility metric(DM),normalized average equivalence class size(Cavg),global certainty penalty(GCP),query error rate,classification accuracy(CA),and F-measure metrics.Moreover,the increase in the values of theGCPand error ratemetrics demonstrates that the proposed algorithm facilitates obtaining higher data utility by grouping closer data points when compared to other algorithms.展开更多
Privacy-preserving data publishing (PPDP) is one of the hot issues in the field of the network security. The existing PPDP technique cannot deal with generality attacks, which explicitly contain the sensitivity atta...Privacy-preserving data publishing (PPDP) is one of the hot issues in the field of the network security. The existing PPDP technique cannot deal with generality attacks, which explicitly contain the sensitivity attack and the similarity attack. This paper proposes a novel model, (w,γ, k)-anonymity, to avoid generality attacks on both cases of numeric and categorical attributes. We show that the optimal (w, γ, k)-anonymity problem is NP-hard and conduct the Top-down Local recoding (TDL) algorithm to implement the model. Our experiments validate the improvement of our model with real data.展开更多
Most of the data publishing methods have not considered sensitivity protection,and hence the adversary can disclose privacy by sensitivity attack.Faced with this problem,this paper presents a medical data publishing m...Most of the data publishing methods have not considered sensitivity protection,and hence the adversary can disclose privacy by sensitivity attack.Faced with this problem,this paper presents a medical data publishing method based on sensitivity determination.To protect the sensitivity,the sensitivity of disease information is determined by semantics.To seek the trade-off between information utility and privacy security,the new method focusses on the protection of sensitive values with high sensitivity and assigns the highly sensitive disease information to groups as evenly as possible.The experiments are conducted on two real-world datasets,of which the records include various attributes of patients.To measure sensitivity protection,the authors define a metric,which can evaluate the degree of sensitivity disclosure.Besides,additional information loss and discernability metrics are used to measure the availability of released tables.The experimental results indicate that the new method can provide better privacy than the traditional one while the information utility is guaranteed.Besides value protection,the proposed method can provide sensitivity protection and available releasing for medical data.展开更多
In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajecto...In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.展开更多
Speech data publishing breaches users'data privacy,thereby causing more privacy disclosure.Existing work sanitizes content,voice,and voiceprint of speech data without considering the consistence among these three ...Speech data publishing breaches users'data privacy,thereby causing more privacy disclosure.Existing work sanitizes content,voice,and voiceprint of speech data without considering the consistence among these three features,and thus is susceptible to inference attacks.To address the problem,we design a privacy-preserving protocol for speech data publishing(P3S2)that takes the corrections among the three factors into consideration.To concrete,we first propose a three-dimensional sanitization that uses feature learning to capture characteristics in each dimension,and then sanitize speech data using the learned features.As a result,the correlations among the three dimensions of the sanitized speech data are guaranteed.Furthermore,the(ε,δ)-differential privacy is used to theoretically prove both the data privacy preservation and the data utility guarantee of P3S2,filling the gap of algorithm design and performance evaluation.Finally,simulations on two real world datasets have demonstrated both the data privacy preservation and the data utility guarantee.展开更多
With the increasing prevalence of social networks, more and more social network data are published for many applications, such as social network analysis and data mining. However, this brings privacy problems. For exa...With the increasing prevalence of social networks, more and more social network data are published for many applications, such as social network analysis and data mining. However, this brings privacy problems. For example, adversaries can get sensitive information of some individuals easily with little background knowledge. How to publish social network data for analysis purpose while preserving the privacy of individuals has raised many concerns. Many algorithms have been proposed to address this issue. In this paper, we discuss this privacy problem from two aspects: attack models and countermeasures. We analyse privacy conceres, model the background knowledge that adversary may utilize and review the recently developed attack models. We then survey the state-of-the-art privacy preserving methods in two categories: anonymization methods and differential privacy methods. We also provide research directions in this area.展开更多
The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may...The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.展开更多
Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee....Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms.展开更多
Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and pr...Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and processing big data poses a privacy concern related to protecting individuals’sensitive information while maintaining the usability of the published data.Several anonymization methods,such as slicing and merging,have been designed as solutions to the privacy concerns for publishing big data.However,the major drawback of merging and slicing is the random permutation procedure,which does not always guarantee complete protection against attribute or membership disclosure.Moreover,merging procedures may generatemany fake tuples,leading to a loss of data utility and subsequent erroneous knowledge extraction.This study therefore proposes a slicingbased enhanced method for privacy-preserving big data publishing while maintaining the data utility.In particular,the proposed method distributes the data into horizontal and vertical partitions.The lower and upper protection levels are then used to identify the unique and identical attributes’values.The unique and identical attributes are swapped to ensure the published big data is protected from disclosure risks.The outcome of the experiments demonstrates that the proposed method could maintain data utility and provide stronger privacy preservation.展开更多
Recently,local differential privacy(LDP)has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees.Existing LDP-based mechanisms mainly focus on learning statistical infor...Recently,local differential privacy(LDP)has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees.Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data.For the first time in the literature,we use LDP for distance estimation between distributed data to support more complicated data analysis.Specifically,we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space.We also present an optimization strategy for reducing privacy leakage in the high-dimensional space.The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments.As study cases,we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering.Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.展开更多
Due to widespread growth of cloud technology,virtual server accomplished in cloud platform may collect useful data from a client and then jointly disclose the client’s sensitive data without permission.Hence,from the...Due to widespread growth of cloud technology,virtual server accomplished in cloud platform may collect useful data from a client and then jointly disclose the client’s sensitive data without permission.Hence,from the perspective of cloud clients,it is very important to take confident technical actions to defend their privacy at client side.Accordingly,different privacy protection techniques have been presented in the literature for safeguarding the original data.This paper presents a technique for privacy preservation of cloud data using Kronecker product and Bat algorithm-based coefficient generation.Overall,the proposed privacy preservation method is performed using two important steps.In the first step,PU coefficient is optimally found out using PUBAT algorithm with new objective function.In the second step,input data and PU coefficient is then utilized for finding the privacy protected data for further data publishing in cloud environment.For the performance analysis,the experimentation is performed with three datasets namely,Cleveland,Switzerland and Hungarian and evaluation is performed using accuracy and DBDR.From the outcome,the proposed algorithm obtained the accuracy of 94.28%but the existing algorithm obtained only the 83.64%to prove the utility.On the other hand,the proposed algorithm obtained DBDR of 35.28%but the existing algorithm obtained only 12.89%to prove the privacy measure.展开更多
基金supported by the Scientific and Technological Research Council of Turkiye,under Project No.(122E670).
文摘Developing a privacy-preserving data publishing algorithm that stops individuals from disclosing their identities while not ignoring data utility remains an important goal to achieve.Because finding the trade-off between data privacy and data utility is an NP-hard problem and also a current research area.When existing approaches are investigated,one of the most significant difficulties discovered is the presence of outlier data in the datasets.Outlier data has a negative impact on data utility.Furthermore,k-anonymity algorithms,which are commonly used in the literature,do not provide adequate protection against outlier data.In this study,a new data anonymization algorithm is devised and tested for boosting data utility by incorporating an outlier data detection mechanism into the Mondrian algorithm.The connectivity-based outlier factor(COF)algorithm is used to detect outliers.Mondrian is selected because of its capacity to anonymize multidimensional data while meeting the needs of real-world data.COF,on the other hand,is used to discover outliers in high-dimensional datasets with complicated structures.The proposed algorithm generates more equivalence classes than the Mondrian algorithm and provides greater data utility than previous algorithms based on k-anonymization.In addition,it outperforms other algorithms in the discernibility metric(DM),normalized average equivalence class size(Cavg),global certainty penalty(GCP),query error rate,classification accuracy(CA),and F-measure metrics.Moreover,the increase in the values of theGCPand error ratemetrics demonstrates that the proposed algorithm facilitates obtaining higher data utility by grouping closer data points when compared to other algorithms.
基金supported in part by Research Fund for the Doctoral Program of Higher Education of China(No.20120009110007)Program for Innovative Research Team in University of Ministry of Education of China (No.IRT201206)+3 种基金Program for New Century Excellent Talents in University(NCET-110565)the Fundamental Research Funds for the Central Universities(No.2012JBZ010)the Open Project Program of Beijing Key Laboratory of Trusted Computing at Beijing University of TechnologyBeijing Higher Education Young Elite Teacher Project(No. YETP0542)
文摘Privacy-preserving data publishing (PPDP) is one of the hot issues in the field of the network security. The existing PPDP technique cannot deal with generality attacks, which explicitly contain the sensitivity attack and the similarity attack. This paper proposes a novel model, (w,γ, k)-anonymity, to avoid generality attacks on both cases of numeric and categorical attributes. We show that the optimal (w, γ, k)-anonymity problem is NP-hard and conduct the Top-down Local recoding (TDL) algorithm to implement the model. Our experiments validate the improvement of our model with real data.
基金supported by the National Natural Science Foundation of China(No.62062016)Doctoral research start‐up fund of Guangxi Normal University(RZ1900006676)Guangxi project of improving Middleaged/Young teachers'ability(No.2020KY020323)。
文摘Most of the data publishing methods have not considered sensitivity protection,and hence the adversary can disclose privacy by sensitivity attack.Faced with this problem,this paper presents a medical data publishing method based on sensitivity determination.To protect the sensitivity,the sensitivity of disease information is determined by semantics.To seek the trade-off between information utility and privacy security,the new method focusses on the protection of sensitive values with high sensitivity and assigns the highly sensitive disease information to groups as evenly as possible.The experiments are conducted on two real-world datasets,of which the records include various attributes of patients.To measure sensitivity protection,the authors define a metric,which can evaluate the degree of sensitivity disclosure.Besides,additional information loss and discernability metrics are used to measure the availability of released tables.The experimental results indicate that the new method can provide better privacy than the traditional one while the information utility is guaranteed.Besides value protection,the proposed method can provide sensitivity protection and available releasing for medical data.
基金supported by the Fundamental Research Funds for the Central Universities(No.GK201906009)CERNET Innovation Project(No.NGII20190704)Science and Technology Program of Xi’an City(No.2019216914GXRC005CG006-GXYD5.2).
文摘In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.
基金National Natural Science Foundation of China(No.61902060)Shanghai Sailing Program,China(No.19YF1402100)Fundamental Research Funds for the Central Universities,China(No.2232019D3-51)。
文摘Speech data publishing breaches users'data privacy,thereby causing more privacy disclosure.Existing work sanitizes content,voice,and voiceprint of speech data without considering the consistence among these three features,and thus is susceptible to inference attacks.To address the problem,we design a privacy-preserving protocol for speech data publishing(P3S2)that takes the corrections among the three factors into consideration.To concrete,we first propose a three-dimensional sanitization that uses feature learning to capture characteristics in each dimension,and then sanitize speech data using the learned features.As a result,the correlations among the three dimensions of the sanitized speech data are guaranteed.Furthermore,the(ε,δ)-differential privacy is used to theoretically prove both the data privacy preservation and the data utility guarantee of P3S2,filling the gap of algorithm design and performance evaluation.Finally,simulations on two real world datasets have demonstrated both the data privacy preservation and the data utility guarantee.
文摘With the increasing prevalence of social networks, more and more social network data are published for many applications, such as social network analysis and data mining. However, this brings privacy problems. For example, adversaries can get sensitive information of some individuals easily with little background knowledge. How to publish social network data for analysis purpose while preserving the privacy of individuals has raised many concerns. Many algorithms have been proposed to address this issue. In this paper, we discuss this privacy problem from two aspects: attack models and countermeasures. We analyse privacy conceres, model the background knowledge that adversary may utilize and review the recently developed attack models. We then survey the state-of-the-art privacy preserving methods in two categories: anonymization methods and differential privacy methods. We also provide research directions in this area.
基金Doctoral research start-up fund of Guangxi Normal UniversityGuangzhou Research Institute of Communication University of China Common Construction Project,Sunflower-the Aging Intelligent CommunityGuangxi project of improving Middle-aged/Young teachers'ability,Grant/Award Number:2020KY020323。
文摘The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.
基金Supported by the Program for New Century Excellent Talents in Universities (Grant No. NCET-06-0290)the National Natural Science Foundation of China (Grant Nos. 60828004, 60503036)the Fok Ying Tong Education Foundation Award (Grant No. 104027)
文摘Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms.
基金This work was supported by Postgraduate Research Grants Scheme(PGRS)with Grant No.PGRS190360.
文摘Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and processing big data poses a privacy concern related to protecting individuals’sensitive information while maintaining the usability of the published data.Several anonymization methods,such as slicing and merging,have been designed as solutions to the privacy concerns for publishing big data.However,the major drawback of merging and slicing is the random permutation procedure,which does not always guarantee complete protection against attribute or membership disclosure.Moreover,merging procedures may generatemany fake tuples,leading to a loss of data utility and subsequent erroneous knowledge extraction.This study therefore proposes a slicingbased enhanced method for privacy-preserving big data publishing while maintaining the data utility.In particular,the proposed method distributes the data into horizontal and vertical partitions.The lower and upper protection levels are then used to identify the unique and identical attributes’values.The unique and identical attributes are swapped to ensure the published big data is protected from disclosure risks.The outcome of the experiments demonstrates that the proposed method could maintain data utility and provide stronger privacy preservation.
基金supported by National Key Research and Development Program of China(Nos.2019QY1402 and 2016YFB0800901)。
文摘Recently,local differential privacy(LDP)has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees.Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data.For the first time in the literature,we use LDP for distance estimation between distributed data to support more complicated data analysis.Specifically,we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space.We also present an optimization strategy for reducing privacy leakage in the high-dimensional space.The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments.As study cases,we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering.Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.
文摘Due to widespread growth of cloud technology,virtual server accomplished in cloud platform may collect useful data from a client and then jointly disclose the client’s sensitive data without permission.Hence,from the perspective of cloud clients,it is very important to take confident technical actions to defend their privacy at client side.Accordingly,different privacy protection techniques have been presented in the literature for safeguarding the original data.This paper presents a technique for privacy preservation of cloud data using Kronecker product and Bat algorithm-based coefficient generation.Overall,the proposed privacy preservation method is performed using two important steps.In the first step,PU coefficient is optimally found out using PUBAT algorithm with new objective function.In the second step,input data and PU coefficient is then utilized for finding the privacy protected data for further data publishing in cloud environment.For the performance analysis,the experimentation is performed with three datasets namely,Cleveland,Switzerland and Hungarian and evaluation is performed using accuracy and DBDR.From the outcome,the proposed algorithm obtained the accuracy of 94.28%but the existing algorithm obtained only the 83.64%to prove the utility.On the other hand,the proposed algorithm obtained DBDR of 35.28%but the existing algorithm obtained only 12.89%to prove the privacy measure.