Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist severalcommon attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinemen...Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist severalcommon attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closenessmodels address the case in which the original data have only one single sensitive attribute, data with multiple sensitiveattributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitiveattributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributesin any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness,both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses aclustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasi-identifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the lossof information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. Theresults show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm butthe former can preserve more original information. In addition, compared with related approaches, both proposed algorithmscan achieve stronger protection of privacy and reduce less.展开更多
Anonymized data publication has received considerable attention from the research community in recent years. For numerical sensitive attributes, most of the existing privacy-preserving data publishing techniques conce...Anonymized data publication has received considerable attention from the research community in recent years. For numerical sensitive attributes, most of the existing privacy-preserving data publishing techniques concentrate on microdata with multiple categorical sensitive attributes or only one numerical sensitive attribute. However, many real-world applications can contain multiple numerical sensitive attributes. Directly applying the existing privacy-preserving techniques for single-numerical-sensitive-attribute and multiple-categorical-sensitive- attributes often causes unexpected disclosure of private information. These techniques are particularly prone to the proximity breach, which is a privacy threat specific to numerical sensitive attributes in data publication, in this paper, we propose a privacy-preserving data publishing method, namely MNSACM, which uses the ideas of clustering and Multi-Sensitive Bucketization (MSB) to publish microdata with multiple numerical sensitive attributes. We use an example to show the effectiveness of this method in privacy protection when using multiple numerical sensitive attributes.展开更多
文摘Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist severalcommon attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of k-anonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closenessmodels address the case in which the original data have only one single sensitive attribute, data with multiple sensitiveattributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitiveattributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributesin any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness,both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses aclustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasi-identifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the lossof information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. Theresults show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm butthe former can preserve more original information. In addition, compared with related approaches, both proposed algorithmscan achieve stronger protection of privacy and reduce less.
基金supported by the National Natural Science Foundation of China (No. 61170232)the 985 Project Funding of Sun Yat-sen University+1 种基金State Key Laboratory of Rail Traffic Control and Safety Independent Research (No. RS2012K011)Ministry of Education Funds for Innovative Groups (No. 241147529)
文摘Anonymized data publication has received considerable attention from the research community in recent years. For numerical sensitive attributes, most of the existing privacy-preserving data publishing techniques concentrate on microdata with multiple categorical sensitive attributes or only one numerical sensitive attribute. However, many real-world applications can contain multiple numerical sensitive attributes. Directly applying the existing privacy-preserving techniques for single-numerical-sensitive-attribute and multiple-categorical-sensitive- attributes often causes unexpected disclosure of private information. These techniques are particularly prone to the proximity breach, which is a privacy threat specific to numerical sensitive attributes in data publication, in this paper, we propose a privacy-preserving data publishing method, namely MNSACM, which uses the ideas of clustering and Multi-Sensitive Bucketization (MSB) to publish microdata with multiple numerical sensitive attributes. We use an example to show the effectiveness of this method in privacy protection when using multiple numerical sensitive attributes.