基于邻域属性熵的隐私保护数据干扰方法被引量：16

A Privacy-Preserving Data Perturbation Algorithm Based on Neighborhood Entropy

下载PDF

导出

摘要隐私保护微数据发布是数据隐私保护研究的一个热点,数据干扰是隐私保护微数据发布采用的一种有效解决方法.针对隐私保护聚类问题,提出一种隐私保护数据干扰方法NETPA,NETPA干扰方法通过对数据点及邻域点集的分析,借助信息论中熵的理论,提出邻域属性熵和邻域主属性等概念,对原始数据中数据点的邻域主属性值用其k邻域点集内数据点在该属性的均值进行干扰替换,在较好地维持原始数据k邻域关系的情况下达到保护原始数据隐私不泄露的目的.理论分析表明,NETPA干扰方法具有良好地避免隐私泄露的效果,同时可以较好地维持原始数据的聚类模式.实验采用DBSCAN和k-LDCHD聚类算法对干扰前后的数据进行聚类分析比对.实验结果表明,干扰前后数据聚类结果具有较高的相似度,算法是有效可行的. Privacy preserving micro-data publishing is a hot issue in data privacy preserving research. Data perturbation is one of those methods to solve this problem, which does some revision to primitive data values at the cost of little mining accuracy loss. The key is the balance between privacy preserving and mining accuracy, which contradict each other to some extent. Concerning the problem of privacy preserving clustering, a novel privacy preserving data perturbation algorithm NETPA is proposed. The potential relation between data object and it＇s neighborhood is analyzed. Referring the idea of entropy in information theory, the definitions of neighborhood entropy of attribute and neighboring main attribute are proposed. The primitive data set can be perturbed by changing each data object＇s values of neighboring main attributes with corresponding attribute average value of those data objects in its k nearest neighborhood. Theoretical analysis testifies that this perturbation strategy can maintain the stability of k nearest neighboring relations in primitive data well, meanwhile it can avoid privacy leakage effectively. Experimental analysis is designed by adopting clustering algorithm DBCSAN and k-LDCHD on primitive datasets and perturbed ones by NETPA. Experimental results on both realistic and synthetic datasets prove that NETPA can preserve the privacy of primitive data effectively and maintain the clustering model of primitive data well.

作者倪巍伟徐立臻崇志宏吴英杰刘腾腾孙志挥

机构地区东南大学计算机科学与工程学院

出处《计算机研究与发展》 EI CSCD 北大核心 2009年第3期498-504,共7页 Journal of Computer Research and Development

基金江苏省自然科学基金项目(BK2006095) 教育部高等学校博士学科点专项科研基金项目(20040286009)~~

关键词隐私保护聚类挖掘邻域属性熵邻域主属性数据干扰 privacy preserving clustering neighborhood entropy of attribute, neighboring mainattribute data perturbation

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1Kantarcioglu M, Jin Jiasun, Clifton C. When do data mining results violate privacy [C]//Proc of the 10th ACM SIGKDD on Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2004:599-604
2Agrawal R, Srikant R. Privacy-preserving data mining [C]// Proc of the 2000 ACM SIGMOD Conf on Management of Data. New York: ACM, 2000:439-450
3Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, et al. Approximation algorithms for k knonymity [C] //Proc of ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2007:67-78
4Du Yang, Xia Tian, Tao Yufei, et al. On multidimensional k-anonymity with local recoding generalization [C] //Proc of IEEE 23rd Int Conf on Data Engineering. Los Alamitos: IEEE Computer Society, 2007:1422-1424
5Tao Yufei, Xiao Xiaokui, Li Jiexing, et al. On anti corruption privacy preserving publication [C]//Proc of the 24th Int Conf on Data Engineering (ICDE). Los Alamitos: IEEE Computer Society, 2008:725-734
6Oliveira S R M, Zaiane O R. Privacy preservation when sharing data for clustering [C]//Proc of the Int Workshop on Secure Data Management in a Connected World. Berlin: Springer, 2004: 67-82
7Oliveira S R M, Zaiane O R. Privacy-preserving clustering by object similarity-based representation and dimensionality reduction transformation[OL]. [2008-07-29]. http://www. site. uottawa, ca/- zhizhan/ppdmworkshop2004/paper3. pdf, 2004
8Fung B C, Wang Ke, Wang Lingyu, et al. A framework for privacy preserving cluster analysis [C] //Proc of IEEE Int Conf on Intelligence and Security Informatics. Los Alamitos: IEEE Computer Society, 2008:46-51
9倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量：18
10Ester M, Kriegel HP, Sander J, et al. A density based algorithm of discovering clusters in large spatial databases with noise [C]//Proc of the 2nd Int Conf on Knowledge Discovery and Data Mining. Menlo, Park CA: AAAI Press, 1996:226-231

二级参考文献11

1Ester M, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. the 2nd Int'l Conf. Knowledge Discovering in Databases and Data Mining(KDD 96). Menlo Park, CA: AAA I Press, 1996.
2Zhan W, et al. STING: A statistical information grid approach to spatial data mining. In: Proc. the 23rd VLDB Conf. Athens. San Francicso: Morgan Kaufmann, 1997. 186～ 195.
3K. Beyer, J. Goldstein, R. Ramakhrisnan, et al. Nearest neighbor' meaningful. In: Proc. the 7th Int'l Conf. Database Theory ( ICDT' 99), http://citeseer.ist.psu.edu/605885.html,1999.
4A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the neareast neighbor in high dimensional spaces. In: Proc. the 26th Int'l Conf. Very Large Data Bases, San Francisco, 2000.
5Maria Halkidi, Michalis Vazirgiannis. Clustering validity assessment: Finding the optimal partitioning of a data set. IEEE Int'l Conf. Data Mining, California, USA, 2001.
6Zhang T, et al. Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Int'l Conf.Management of Data, Montreal. New York: ACM Press, 1996.73 ～ 84.
7Guha S, Rastogi R, Shin K. CURE: An efficient clustering algorithm for large databases. In: Proc. ACM SIGMOD Int'l Conf. Management of Data, Seattle. New York: ACM Press,1998. 73～84.
8Jiawei Han, Micheline. Data Mining: Concepts and Techniques.San Francisco: Morgan Kaufmann Publishers, 2000.
9C. Ordones, E. Omiecinski. Efficient disk-based K-means clustering for relational databases. IEEE Trans. Knowledge and Data Engineering, 2004, 16:909～921.
10C. Ordonez. Clustering binery data streams with K-means. ACM DKMD Workshop, San Diego, California, 2003.

共引文献17

1陈小全,张继红.基于改进粒子群算法的聚类算法[J].计算机研究与发展,2012,49(S1):287-291. 被引量：31
2薛万宇,谢从华,陆虎,袁林.基于密度聚类的医学图像分割及其局部特征提取[J].医疗设备信息,2006,21(10):88-90. 被引量：1
3倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的流数据高效密度聚类算法[J].小型微型计算机系统,2007,28(1):83-87. 被引量：8
4胡学钢,王东波,吴共庆.一种基于层次树的高效密度聚类算法[J].合肥工业大学学报（自然科学版）,2008,31(2):187-190. 被引量：4
5樊明辉,林甲祥.一种新的考虑空间实体约束的空间聚类算法[J].福建电脑,2008,24(9):69-71.
6倪巍伟,陈耿,吴英杰,孙志挥.一种基于局部密度的分布式聚类挖掘算法[J].软件学报,2008,19(9):2339-2348. 被引量：19
7刘铭,王晓龙,刘远超.一种大规模高维数据快速聚类算法[J].自动化学报,2009,35(7):859-866. 被引量：18
8武佳薇,李雄飞,孙涛,李巍.邻域平衡密度聚类算法[J].计算机研究与发展,2010,47(6):1044-1052. 被引量：22
9党小超,郝占军,王筱娟.基于簇连接度聚类算法的入侵检测[J].计算机工程与应用,2010,46(21):82-85. 被引量：1
10黄旭,吕强,钱培德.一种用于蛋白质结构聚类的聚类中心选择算法[J].自动化学报,2011,37(6):682-692. 被引量：7

同被引文献268

1刘宴兵,刘飞飞.基于云计算的智能手机社交认证系统[J].通信学报,2012,33(S1):28-34. 被引量：7
2张勇,倪巍伟,崇志宏,胡新平.基于邻域相关性的面向聚类数据扰动方法[J].计算机研究与发展,2011,48(S3):79-85. 被引量：1
3吴吉义,沈千里,章剑林,沈忠华,平玲娣.云计算:从云安全到可信云[J].计算机研究与发展,2011,48(S1):229-233. 被引量：54
4罗永龙,黄刘生,荆巍巍,姚亦飞,陈国良.一个保护私有信息的布尔关联规则挖掘算法[J].电子学报,2005,33(5):900-903. 被引量：33
5葛伟平,汪卫,周皓峰,施伯乐.基于隐私保护的分类挖掘[J].计算机研究与发展,2006,43(1):39-45. 被引量：20
6黄伟伟,柏文阳.聚类挖掘中隐私保护的几何数据转换方法[J].计算机应用研究,2006,23(6):180-181. 被引量：7
7张鹏,童云海,唐世渭,杨冬青,马秀莉.一种有效的隐私保护关联规则挖掘方法[J].软件学报,2006,17(8):1764-1774. 被引量：53
8张国荣,印鉴.应用正交变换保护数据中的隐私信息[J].计算机应用研究,2006,23(10):95-97. 被引量：4
9Kantarcioglu M,Jin Jiasun,Clifton C.When do data mining results violate privacy?[C]//Proc of the 10th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2004:599-604.
10Agrawal R,Srikant R.Privacy-preserving data mining[C]//Proc of the 2000 ACM SIGMOD Conf on Management of Data.New York:ACM,2000:439-450.

引证文献16

1张勇,倪巍伟,崇志宏,胡新平.基于邻域相关性的面向聚类数据扰动方法[J].计算机研究与发展,2011,48(S3):79-85. 被引量：1
2崇志宏,倪巍伟,刘腾腾,张勇.一种面向聚类的隐私保护数据发布方法[J].计算机研究与发展,2010,47(12):2083-2089. 被引量：13
3徐勇,王浩,李东勤.数据发布领域匿名隐私保护相关技术研究[J].情报杂志,2011,30(8):128-133. 被引量：2
4徐勇,丁忠明,司凤山.基于频繁项集发现的匿名隐私保护算法[J].计算机应用研究,2011,28(10):3828-3830.
5贡晓静,钟诚,华蓓.基于等距变换的聚类挖掘敏感信息保护方法[J].计算机工程,2011,37(19):122-125. 被引量：4
6徐勇,秦小麟,杨一涛,杨种学,黄灿.一种考虑属性权重的隐私保护数据发布方法[J].计算机研究与发展,2012,49(5):913-924. 被引量：17
7倪巍伟,陈耿,崇志宏,吴英杰.面向聚类的数据隐藏发布研究[J].计算机研究与发展,2012,49(5):1095-1104. 被引量：16
8贺玉芝,倪巍伟,张勇.基于密度可达的聚类隐私保护模型[J].东南大学学报（自然科学版）,2012,42(5):825-831.
9胡新平,贺玉芝,倪巍伟,张勇.基于赌轮选择遗传算法的数据隐藏发布方法[J].计算机研究与发展,2012,49(11):2432-2439. 被引量：12
10李晓晔,孙振龙,邓佳宾,宋广军.隐私保护技术研究综述[J].计算机科学,2013,40(11A):199-202. 被引量：4

二级引证文献117

1董建达,宁康红,高强,卢家欢,裴传逊,任娇蓉.基于模糊AHP评价的电力线路入廊规划全过程咨询模型[J].电网与清洁能源,2019,35(2):38-43. 被引量：5
2李玮瑶,刘建粉,吕海莲.基于模糊扩展聚类的关联编码算法设计[J].微电子学与计算机,2015,32(6):138-141.
3黄茂峰,倪巍伟,王佳俊,孙福林,崇志宏.一种面向聚类的对数螺线数据扰动方法[J].计算机学报,2012,35(11):2275-2282. 被引量：7
4倪巍伟,张勇,黄茂峰,崇志宏,贺玉芝.一种向量等价置换隐私保护数据干扰方法[J].软件学报,2012,23(12):3198-3208. 被引量：6
5桂琼,程小辉.基于聚类的分级匿名方法[J].计算机应用,2013,33(2):412-416. 被引量：4
6李杨,郝志峰,温雯,谢光强.差分隐私保护k-means聚类方法研究[J].计算机科学,2013,40(3):287-290. 被引量：48
7廖龙龙,叶强,路红.面向移动感知服务的数据隐私保护技术研究[J].计算机工程与设计,2013,34(6):1951-1955. 被引量：12
8刘晓娜,杜永文,蔺国梁.移动数据库中组合视图对隐私保护的应用[J].计算机光盘软件与应用,2013,16(13):13-14. 被引量：1
9刘晓娜,马少斌,王栋.移动数据库中基于k-匿名的隐私保护模型研究[J].计算机光盘软件与应用,2013,16(14):149-150.
10林瑞,钟诚,华蓓.隐私保护的一站多表跨多表频繁项集挖掘[J].计算机应用,2013,33(12):3437-3440. 被引量：1

1张振宇,黄崇林,谭恒松.基于小波变换的图像识别算法[J].计算机应用,2007,27(B12):97-99. 被引量：6
2王曙霞,熊曾刚.海量数据干扰下的危险Web数据挖掘技术研究[J].微电子学与计算机,2016,33(2):87-91. 被引量：6
3胡新平,贺玉芝,倪巍伟,张勇.基于赌轮选择遗传算法的数据隐藏发布方法[J].计算机研究与发展,2012,49(11):2432-2439. 被引量：12
4朱幼莲.单片机控制系统软件抗干扰技术[J].常州技术师范学院学报,1995(2):22-26.
5丛爽,杜浩藩.几种消除计算机干扰方法的效果的对比研究[J].工业控制计算机,2002,15(12):43-45. 被引量：1
6方炜炜,谢伟,黄宏博,夏红科.基于隐私保护的序列模式挖掘[J].计算机科学,2016,43(12):195-199. 被引量：4
7彭晓冰,李启顺,王丽珍,朱玉全.面向SVM的隐私保护方法研究进展[J].江苏大学学报（自然科学版）,2017,38(1):78-85. 被引量：3
8陈仿杰.基于自适应压缩感知的信道估计算法[J].无线电通信技术,2014,40(3):39-41.
9杜卫华.浅淡微机测控系统的干扰防范[J].承德石油高等专科学校学报,1997(1):18-22.
10张志强.C++二义性问题探讨[J].太原师范学院学报（自然科学版）,2008,7(1):62-66.

计算机研究与发展

2009年第3期

浏览历史

内容加载中请稍等...

基于邻域属性熵的隐私保护数据干扰方法被引量：16

参考文献11

二级参考文献11

共引文献17

同被引文献268

引证文献16

二级引证文献117

相关作者

相关机构

相关主题

浏览历史

基于邻域属性熵的隐私保护数据干扰方法 被引量：16

参考文献11

二级参考文献11

共引文献17

同被引文献268

引证文献16

二级引证文献117

相关作者

相关机构

相关主题

浏览历史

基于邻域属性熵的隐私保护数据干扰方法被引量：16