摘要
针对传统的聚类算法存在隐私泄露的风险,提出一种基于差分隐私保护的谱聚类算法。该算法基于差分隐私模型,利用累计分布函数生成满足拉普拉斯分布的随机噪声,将该噪声添加到经过谱聚类算法计算的样本相似度的函数中,干扰样本个体之间的权重值,实现样本个体间的信息隐藏以达到隐私保护的目的。通过UCI数据集上的仿真实验,表明该算法能够在一定的信息损失度范围内实现有效的数据聚类,也可以对聚类数据进行保护。
Aiming at the problem of privacy leakage in the application of traditional clustering algorithm,a spectral clustering algorithm based on differential privacy protection was proposed.Based on the differential privacy model,the cumulative distribution function was used to generate random noise that satisfies Laplasse distribution.Then the noise was added to the sample similarity function calculated by the spectral clustering algorithm,which disturbed the weight values between the individual samples and realized information hiding between sample individuals for privacy protection.Experimental results of UCI dataset verify that the proposed algorithm can achieve effective data clustering within a certain degree of information loss,and can also protect clustered data.
作者
郑孝遥
陈冬梅
刘雨晴
尤浩
汪祥舜
孙丽萍
ZHENG Xiaoyao;CHEN Dongmei;LIU Yuqing;YOU Hao;WANG Xiangshun;SUN Liping(School Computer and Information,Anhui Normal University,Wuhu Anhui 241002,China;Anhui Provincial Key Laboratory of Network and Information Security(Anhui Normal University),Wuhu Anhui 241002,China)
出处
《计算机应用》
CSCD
北大核心
2018年第10期2918-2922,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61772034
61602009)
安徽省自然科学基金资助项目(1808085MF172)~~
关键词
差分隐私
谱聚类
敏感数据
隐私泄露
differential privacy
spectral clustering
sensitive data
privacy leakage