期刊文献+

基于杂度增益与层次聚类的数据匿名方法 被引量:6

A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering
下载PDF
导出
摘要 数据匿名是发布数据时对隐私信息进行保护的重要手段之一.对数据匿名的基本概念和应用模型进行了介绍,探讨了数据匿名结果应该满足的要求.为了抵制背景知识攻击,提出了一种基于杂度增益与层次聚类的数据匿名方法,该方法以杂度来度量敏感属性随机性,并以概化过程中信息损失最小、杂度增益最大的条件约束来控制聚类的合并过程,可以使数据匿名处理后的数据集在满足k-匿名模型和l-多样模型的同时,使数据概化的信息损失最小且敏感属性的取值均匀化.在实验部分,提出了一种对数据匿名结果进行评估的方法,该方法将匿名结果和原始数据进行对比,并从平均信息损失和平均杂度2个方面来评估数据匿名的质量.实验结果验证了以上方法的有效性. Data anonymization is one of the important solutions to preserve privacy in data publishing. The basic concept of data anonymization and the application models are introduced, and the requirements that an anonymized dataset should meet are discussed. To resist the background knowledge attack, a new data anonymization approach based on impurity gain and hierarchical clustering is brought out. The impurity of a cluster is used to measure the randomicity of sensitive attributes, and the clusters' combination process is controlled by the restrictions that the information loss caused by generalization should be minimized and the impurity gain should be maximized. With the method, the anonymization results of a dataset can meet the requirements of k anonymity model and /-diversity model, meanwhile, the information loss is minimized and the values of the sensitive attributes in each cluster has a uniform distribution. An evaluation method is provided in the experiment section, which compares anonymized dataset with the original one to evaluate the quality by calculating the average information loss and impurity. The experimental results validate the availability of the method.
作者 熊平 朱天清
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第7期1545-1552,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(70903076) 中央高校基本科研业务费专项基金项目(31540911202)
关键词 隐私保护 数据匿名 准标识符 层次聚类 信息损失 privacy preserving data anonymization quasi-identifier hierarchical clustering information loss
  • 相关文献

参考文献12

  • 1Aggarwal G,Feder T,Kenthapadi K. k-anonymity:Algorithms and hardness,2004-22[R].Stanford,California:Stanford University Press,2004.doi:10.1213/ANE.0b013e3181e3dfd2.
  • 2Lefvre K,DeWitt D,Ramakrishnan R. Incognito:Efficient full-domain k-anonymity[A].New York:ACM,2005.49-60.
  • 3Bayardo R,Agrawal R. Data privacy through optimal kanonymization[A].Piscataway,NJ:IEEE,2005.217-228.
  • 4Samarati P,Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression[A].Piscataway,NJ:IEEE,1998.1-19.
  • 5Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression[J].International Journal of Uncertainty Fuzziness and Knowledge-Based Systems,2002,(05):571-558.
  • 6Adam N,Wortmann J. Security-control methods for statistical databases:A comparative study[J].ACM Computing Surveys,1989,(04):515-556.doi:10.1145/76894.76895.
  • 7Duncan T,Feinberg S. Ohtaining information while preserving privacy: A Markov perturbation method for tabular data[A].Luxembourg:Eurostat,1999.351-362.
  • 8Samarati P. Protecting respondents' identities in microdata release[J].IEEE Transactions on Knowledge and Data Engineering,2001,(06):1010-1027.doi:10.1109/69.971193.
  • 9Sweeney L. K-Anonymity:A model for protecting privacy[J].International Journal of Uncertainty Fuzziness and Knowledge-Based Systems,2002,(05):557-570.
  • 10Machanavajjhala A,Gehrke J,Kifer D. /-Diversity:Privacy beyond k-anonymity[A].Piscataway,NJ:IEEE,2006.24-36.

二级参考文献1

共引文献48

同被引文献89

  • 1彭京,唐常杰,程温泉,石葆梅,乔少杰.一种基于层次距离计算的聚类算法[J].计算机学报,2007,30(5):786-795. 被引量:11
  • 2MACHANAVAJJHALA A, GEHRKE J, KIFER D. 1-diversity:priva- cy beyond k-anonymity [ C ]//Proe of the 22nd IEEE International Conference on Data Engineering. Washington DC : IEEE Computer So- ciety, 2006 : 24- 36.
  • 3SWEENEY L. K-anonymity:a model for protecting privacy[ J]. Inter- national Journal of Uncertainty, Fuzziness and Knowledge- Based Systems,2002,10(5) :55?-570.
  • 4SWEENEY L. Achieving k-anonymity privacy protection using generaliza-tion and suppression[ J ~. International Journal of Uncertainty, Fuzzi- ness and Knowledge-Based Systems ,2002,10 (5) :571-588.
  • 5IYENGAR V. Transforming data to satisfy privacy constraints [ C ]// Proc of the 12th ACM SIGKDD Conference. New York:ACM Press, 2002:279-288.
  • 6MEYERSON A, WILLIAMS R. On the complexity of optimal k-ano- nymity [ C ]//Proc of the 23rd ACM Symposium on Principles of Data- base Systems. New York : ACM Press ,2004:223- 228.
  • 7LEFVRE K, DEWITf D, RAMAKRISHNAN R. Incognito: efficient full-domain k-anonymity [ C ]//Proc of International Conference on Management of Data. New York:ACM Press, 2005:49-60.
  • 8BAYARDO R, AGRAWAL R. Data privacy through optimal k-anony- mization [ C ]//Proc of the 21st IEEE International Conference on Data Engineering. Washington DC: IEEE Computer Society, 2005: 217- 228.
  • 9WONG R C,LI J,FU A W C,et al. (c~,k)-anonymity:an enhanced k- anonymity model for privacy preserving data publishing [ C ]//Proc of the 12th ACM SIGKDD International Conference on Knowledge Dis- cove~ and Data mining. New York: ACM Press,2006:754-759.
  • 10TRUTA T, VINAY B. Privacy protection: p-sensitive k-anonymity property[ C]//Proc of the 22nd International Conference on Data En- gineering Workshops. Washington DC:IEEE Computer Society,2006: 94-103.

引证文献6

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部