摘要
笔者介绍了差分隐私保护的研究背景、差分隐私保护的基本原理和方法,分析了k-means算法的隐私泄露问题。针对传统面向差分隐私保护k-means算法存在簇中心选取随机性导致聚类可用性较低的问题,提出一种指数加噪机制与密度估计相结合的方法,选取初始聚类中心,从而保证初始中心挑选的合理性,保障样本数据的隐私性。实验结果表明,提出的新方法可以显著提高聚类结果的可用性。
This paper introduces the research background of differential privacy protection, the basic principles and methods of differential privacy protection, and analyses the privacy leakage of K-means algorithm. Aiming at the problem of low clustering availability caused by randomness of cluster center selection in traditional K-means algorithm for differential privacy protection, a method combining exponential noise-adding mechanism with density estimation is proposed to select initial cluster centers, so as to ensure the rationality of initial center selection and the privacy of sample data. The experimental results show that the proposed method can significantly improve the availability of clustering results.
作者
赵莉
付世凤
Zhao Li;Fu Shifeng(Hunan College of Information, Changsha Hunan 410200, China)
出处
《信息与电脑》
2019年第14期49-52,共4页
Information & Computer
基金
国家自然科学基金“大数据环境下的数据查询隐私保护技术研究”(项目编号:61472131)
关键词
隐私保护
差分隐私
K-MEANS
聚类算法
privacy protection
differential privacy
k-means
clustering algorithm