期刊文献+

基于邻域属性熵的隐私保护数据干扰方法 被引量:16

A Privacy-Preserving Data Perturbation Algorithm Based on Neighborhood Entropy
下载PDF
导出
摘要 隐私保护微数据发布是数据隐私保护研究的一个热点,数据干扰是隐私保护微数据发布采用的一种有效解决方法.针对隐私保护聚类问题,提出一种隐私保护数据干扰方法NETPA,NETPA干扰方法通过对数据点及邻域点集的分析,借助信息论中熵的理论,提出邻域属性熵和邻域主属性等概念,对原始数据中数据点的邻域主属性值用其k邻域点集内数据点在该属性的均值进行干扰替换,在较好地维持原始数据k邻域关系的情况下达到保护原始数据隐私不泄露的目的.理论分析表明,NETPA干扰方法具有良好地避免隐私泄露的效果,同时可以较好地维持原始数据的聚类模式.实验采用DBSCAN和k-LDCHD聚类算法对干扰前后的数据进行聚类分析比对.实验结果表明,干扰前后数据聚类结果具有较高的相似度,算法是有效可行的. Privacy preserving micro-data publishing is a hot issue in data privacy preserving research. Data perturbation is one of those methods to solve this problem, which does some revision to primitive data values at the cost of little mining accuracy loss. The key is the balance between privacy preserving and mining accuracy, which contradict each other to some extent. Concerning the problem of privacy preserving clustering, a novel privacy preserving data perturbation algorithm NETPA is proposed. The potential relation between data object and it's neighborhood is analyzed. Referring the idea of entropy in information theory, the definitions of neighborhood entropy of attribute and neighboring main attribute are proposed. The primitive data set can be perturbed by changing each data object's values of neighboring main attributes with corresponding attribute average value of those data objects in its k nearest neighborhood. Theoretical analysis testifies that this perturbation strategy can maintain the stability of k nearest neighboring relations in primitive data well, meanwhile it can avoid privacy leakage effectively. Experimental analysis is designed by adopting clustering algorithm DBCSAN and k-LDCHD on primitive datasets and perturbed ones by NETPA. Experimental results on both realistic and synthetic datasets prove that NETPA can preserve the privacy of primitive data effectively and maintain the clustering model of primitive data well.
出处 《计算机研究与发展》 EI CSCD 北大核心 2009年第3期498-504,共7页 Journal of Computer Research and Development
基金 江苏省自然科学基金项目(BK2006095) 教育部高等学校博士学科点专项科研基金项目(20040286009)~~
关键词 隐私保护 聚类挖掘 邻域属性熵 邻域主属性 数据干扰 privacy preserving clustering neighborhood entropy of attribute, neighboring mainattribute data perturbation
  • 相关文献

参考文献11

  • 1Kantarcioglu M, Jin Jiasun, Clifton C. When do data mining results violate privacy [C]//Proc of the 10th ACM SIGKDD on Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2004:599-604
  • 2Agrawal R, Srikant R. Privacy-preserving data mining [C]// Proc of the 2000 ACM SIGMOD Conf on Management of Data. New York: ACM, 2000:439-450
  • 3Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, et al. Approximation algorithms for k knonymity [C] //Proc of ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2007:67-78
  • 4Du Yang, Xia Tian, Tao Yufei, et al. On multidimensional k-anonymity with local recoding generalization [C] //Proc of IEEE 23rd Int Conf on Data Engineering. Los Alamitos: IEEE Computer Society, 2007:1422-1424
  • 5Tao Yufei, Xiao Xiaokui, Li Jiexing, et al. On anti corruption privacy preserving publication [C]//Proc of the 24th Int Conf on Data Engineering (ICDE). Los Alamitos: IEEE Computer Society, 2008:725-734
  • 6Oliveira S R M, Zaiane O R. Privacy preservation when sharing data for clustering [C]//Proc of the Int Workshop on Secure Data Management in a Connected World. Berlin: Springer, 2004: 67-82
  • 7Oliveira S R M, Zaiane O R. Privacy-preserving clustering by object similarity-based representation and dimensionality reduction transformation[OL]. [2008-07-29]. http://www. site. uottawa, ca/- zhizhan/ppdmworkshop2004/paper3. pdf, 2004
  • 8Fung B C, Wang Ke, Wang Lingyu, et al. A framework for privacy preserving cluster analysis [C] //Proc of IEEE Int Conf on Intelligence and Security Informatics. Los Alamitos: IEEE Computer Society, 2008:46-51
  • 9倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量:18
  • 10Ester M, Kriegel HP, Sander J, et al. A density based algorithm of discovering clusters in large spatial databases with noise [C]//Proc of the 2nd Int Conf on Knowledge Discovery and Data Mining. Menlo, Park CA: AAAI Press, 1996:226-231

二级参考文献11

  • 1Ester M, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. the 2nd Int'l Conf. Knowledge Discovering in Databases and Data Mining(KDD 96). Menlo Park, CA: AAA I Press, 1996.
  • 2Zhan W, et al. STING: A statistical information grid approach to spatial data mining. In: Proc. the 23rd VLDB Conf. Athens. San Francicso: Morgan Kaufmann, 1997. 186~ 195.
  • 3K. Beyer, J. Goldstein, R. Ramakhrisnan, et al. Nearest neighbor' meaningful. In: Proc. the 7th Int'l Conf. Database Theory ( ICDT' 99), http://citeseer.ist.psu.edu/605885.html,1999.
  • 4A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the neareast neighbor in high dimensional spaces. In: Proc. the 26th Int'l Conf. Very Large Data Bases, San Francisco, 2000.
  • 5Maria Halkidi, Michalis Vazirgiannis. Clustering validity assessment: Finding the optimal partitioning of a data set. IEEE Int'l Conf. Data Mining, California, USA, 2001.
  • 6Zhang T, et al. Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Int'l Conf.Management of Data, Montreal. New York: ACM Press, 1996.73 ~ 84.
  • 7Guha S, Rastogi R, Shin K. CURE: An efficient clustering algorithm for large databases. In: Proc. ACM SIGMOD Int'l Conf. Management of Data, Seattle. New York: ACM Press,1998. 73~84.
  • 8Jiawei Han, Micheline. Data Mining: Concepts and Techniques.San Francisco: Morgan Kaufmann Publishers, 2000.
  • 9C. Ordones, E. Omiecinski. Efficient disk-based K-means clustering for relational databases. IEEE Trans. Knowledge and Data Engineering, 2004, 16:909~921.
  • 10C. Ordonez. Clustering binery data streams with K-means. ACM DKMD Workshop, San Diego, California, 2003.

共引文献17

同被引文献268

引证文献16

二级引证文献117

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部