摘要
为了在聚类分析中保护数据隐私的同时确保数据的可用性,提出一种基于本地化差分隐私(LDP)技术的隐私保护聚类方案——LDPK-Prototypes。首先,用户对混合型数据集进行编码;其次,采用随机响应机制对敏感数据进行扰动,而第三方在收集到用户的扰动数据后以最大限度恢复原始数据集;然后,执行K-Prototypes聚类算法,在聚类过程中,使用相异性度量方法确定初始聚类中心,并利用熵权法重新定义新的距离计算公式。理论分析和实验结果表明,所提方案与基于中心化差分隐私(CDP)技术的ODPC算法相比,在Adult和Heart数据集上的平均准确率分别提高了2.95%和12.41%,有效提高了聚类的可用性。同时,LDPK-Prototypes扩大了数据之间的差异性,有效避免了局部最优,提高了聚类算法的稳定性。
In order to protect data privacy while ensuring data availability in clustering analysis,a privacy protection clustering scheme based on Local Differential Privacy(LDP)technique called LDPK-Prototypes(LDP K-Prototypes)was proposed.Firstly,the hybrid dataset was encoded by users.Then,a random response mechanism was used to disturb the sensitive data,and after collecting the users’disturbed data,the original dataset was recovered by the third party to the maximum extent.After that,the K-Prototypes clustering algorithm was performed.In the clustering process,the initial clustering center was determined by the dissimilarity measure method,and the new distance calculation formula was redefined by the entropy weight method.Theoretical analysis and experimental results show that compared with the ODPC(Optimizing and Differentially Private Clustering)algorithm based on the Centralized Differential Privacy(CDP)technique,the proposed scheme has the average accuracy on Adult and Heart datasets improved by 2.95%and 12.41%respectively,effectively improving the clustering usability.Meanwhile,LDPK-Prototypes expands the difference between data,effectively avoids local optimum,and improves the stability of the clustering algorithm.
作者
张国鹏
陈学斌
王豪石
翟冉
马征
ZHANG Guopeng;CHEN Xuebin;WANG Haoshi;ZHAI Ran;MA Zheng(College of Science,North China University of Science and Technology,Tangshan Hebei 063210,China;Hebei Key Laboratory of Data Science and Application(North China University of Science and Technology),Tangshan Hebei 063010,China;Tangshan Key Laboratory of Data Science(North China University of Science and Technology),Tangshan Hebei 063010,China)
出处
《计算机应用》
CSCD
北大核心
2022年第12期3813-3821,共9页
journal of Computer Applications
基金
国家自然科学基金资助项目(U20A20179)。