摘要
差分隐私算法作为当前研究较多的隐私保护机制之一,有着广泛应用。目前有多种基于差分隐私保护的k均值聚类算法,应用场景不一,各有缺陷。以往的算法通过均等划分数据集,构造等宽直方图进行聚类,这会导致没有数据分布的区域也被无差别插入噪声,影响聚类性能。针对这一点,提出了一种新的差分隐私聚类算法DPQTk-means,先通过构建差分隐私四分树,用大小不一的自适应存储桶动态划分数据空间,充分表示数据集同时减少噪声插入,再进行k均值聚类,证明了其满足ε-差分隐私保护。实验结果表明,DPQTk-means算法与以往的差分隐私聚类算法相比具有更好的聚类可用性,且能够在隐私保护水平较高的同时保持稳定的聚类性能。
As one of the most popular privacy protection mechanisms,the differential privacy algorithm has been widely used.At present,there are a variety of k-means clustering algorithms based on differential privacy protection.The application scenarios are different and each has its own defects.There is an algorithm that divides the data set and constructs the equal-width histograms for clustering.This causes the areas without data to be inserted into noise without any difference,which affects clustering performance.To solve this problem,a new differential privacy clustering algorithm DPQTk-means is proposed.By constructing a differential privacy quad tree,the data space is dynamically divided by adaptive buckets of different sizes to fully represent the data set and reduce noise insertion,and then do k-means clustering.It proves that it satisfiesε-differential privacy protection.Experimental results show that the DPQTk-means algorithm has better cluster availability than the previous differential privacy clustering algorithms,and can maintain stable clustering performance while maintaining a high level of privacy protection.
作者
张可铧
成卫青
ZHANG Kehua;CHENG Weiqing(School of Computer,Nanjing University of Posts&Telecommunications,Nanjing 210023,China;Key Laboratory of Computer Network&Information Integration of Ministry of Education,Southeast University,Nanjing 211189,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第2期97-103,共7页
Computer Engineering and Applications
基金
国家自然科学基金(61170332)
计算机网络和信息集成教育部重点实验室资助项目(K93-9-2014-04B)。
关键词
差分隐私
四分树
动态划分
K均值
differential privacy
quad tree
dynamic partition
k-means