期刊文献+

面向本地差分隐私的K-Prototypes聚类方法 被引量:3

K-Prototypes clustering method for local differential privacy
下载PDF
导出
摘要 为了在聚类分析中保护数据隐私的同时确保数据的可用性,提出一种基于本地化差分隐私(LDP)技术的隐私保护聚类方案——LDPK-Prototypes。首先,用户对混合型数据集进行编码;其次,采用随机响应机制对敏感数据进行扰动,而第三方在收集到用户的扰动数据后以最大限度恢复原始数据集;然后,执行K-Prototypes聚类算法,在聚类过程中,使用相异性度量方法确定初始聚类中心,并利用熵权法重新定义新的距离计算公式。理论分析和实验结果表明,所提方案与基于中心化差分隐私(CDP)技术的ODPC算法相比,在Adult和Heart数据集上的平均准确率分别提高了2.95%和12.41%,有效提高了聚类的可用性。同时,LDPK-Prototypes扩大了数据之间的差异性,有效避免了局部最优,提高了聚类算法的稳定性。 In order to protect data privacy while ensuring data availability in clustering analysis,a privacy protection clustering scheme based on Local Differential Privacy(LDP)technique called LDPK-Prototypes(LDP K-Prototypes)was proposed.Firstly,the hybrid dataset was encoded by users.Then,a random response mechanism was used to disturb the sensitive data,and after collecting the users’disturbed data,the original dataset was recovered by the third party to the maximum extent.After that,the K-Prototypes clustering algorithm was performed.In the clustering process,the initial clustering center was determined by the dissimilarity measure method,and the new distance calculation formula was redefined by the entropy weight method.Theoretical analysis and experimental results show that compared with the ODPC(Optimizing and Differentially Private Clustering)algorithm based on the Centralized Differential Privacy(CDP)technique,the proposed scheme has the average accuracy on Adult and Heart datasets improved by 2.95%and 12.41%respectively,effectively improving the clustering usability.Meanwhile,LDPK-Prototypes expands the difference between data,effectively avoids local optimum,and improves the stability of the clustering algorithm.
作者 张国鹏 陈学斌 王豪石 翟冉 马征 ZHANG Guopeng;CHEN Xuebin;WANG Haoshi;ZHAI Ran;MA Zheng(College of Science,North China University of Science and Technology,Tangshan Hebei 063210,China;Hebei Key Laboratory of Data Science and Application(North China University of Science and Technology),Tangshan Hebei 063010,China;Tangshan Key Laboratory of Data Science(North China University of Science and Technology),Tangshan Hebei 063010,China)
出处 《计算机应用》 CSCD 北大核心 2022年第12期3813-3821,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(U20A20179)。
关键词 本地化差分隐私 K-Prototypes 随机响应机制 熵权法 隐私保护 Local Differential Privacy(LDC) K-Prototypes random response mechanism entropy weight method privacy protection
  • 相关文献

参考文献8

二级参考文献50

  • 1周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42
  • 2HAN JIAWEI, KAMBER M. Data mining concepts and techniques [ M]. San Francisco, USA: Morgan Kaufmann, 2001.
  • 3HUANG ZHEXUE. Extensions to the k-means algorithm for clustering large data sets with categorical vaiues[ C]// Data Mining and Knowledge Discovery. Netherlands: Kluwer Academic Publishers, 1998:283-304.
  • 4HUANG ZHEXUE, MICHAEL K NG. A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Transactions on Fuzzy Systems, 1999, 7(4) : 446 -452.
  • 5PALMER C R, FALOUTSOS C. Electricity based external similarity of categorical attributes[ C]// PAKDD '03: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 2637. Berlin: Springer-Verlag, 2003: 486- 500.
  • 6LE SI QUANG, HO TU BAO. A conditional probability distribution- based dissimilarity measure for categorical data[ C]// PAKDD '04: Proceedings of the 8th Pacific- Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 3056. Berlin: Springer-Verlag, 2004:580-589.
  • 7CHENG V, LI C-H, KWOK J T, et al. Dissimilarity learning for nominal data[J]. Pattern Recognition, 2004, 37(7) : 1471 - 1477.
  • 8LEE S-G, YUN D-K. Clustering categorical and numerical data: a new procedure using multidimensional scaling [ J]. International Journal of Information Technology and Decision Making, 2003, 2 (1): 135-160.
  • 9LI CEN, BISWAS GAUTAM. Unsupervised learning with mixed numeric and nominal data[ J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(4) :673 -690.
  • 10AHMAD A, DEY L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[ J]. Pattem Recognition Letters, 2007, 28(1) : 110 -118.

共引文献136

同被引文献27

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部