摘要
采用聚类算法预先处理个人隐私信息实现差分隐私保护,能够减少直接发布直方图数据带来的噪声累积现象,同时减小了直方图因合并方式不同带来的重构误差。针对DP-DBSCAN差分隐私算法存在对数据参数输入敏感问题,将基于密度聚类的OPTICS算法应用于差分隐私保护中,并提出改进的DP-OPTICS差分隐私保护算法,对稀疏型数据集进行压缩处理,对比采用同方差噪声和异方差噪声两种添加噪声方式,考虑攻击者能够攻破隐私信息的概率,确定隐私参数ε的上界,有效平衡了敏感信息的隐私性和数据的可用性之间的关系。将DP-OPTICS算法和基于OPTICS聚类的差分隐私保护算法、DP-DBSCAN算法进行对比,DP-OPTICS算法在时间消耗上介于其余二者之间,但是在取得相同参数的情况下,聚类的稳定性在三者中最好,因此改进后OP-OPTICS差分隐私保护算法总体上是可行的。
Clustering algorithm is used to preprocess personal privacy information in order to achieve differential privacy protection, which can reduce the reconstruction error caused by directly distributing histogram data, and the reconstruction error caused by different combining methods of histogram. Aiming at the problem of sensitivity to input data parameters in DP- DBSCAN ( Differential Privacy-Density-Based Spatial Clustering of Applications with Noise) differential privacy algorithm, the OPTICS (Ordering Points To Identify Clustering Structure) algorithm based on density clustering was applied to differential privacy protection. And an improved differential privacy protection algorithm, called DP-OPTICS (Differential Privacy- Ordering Points To Identify Clustering Structure) was introduced, the sparse dataset was compressed, the same variance noise and different variance noise were used as two noise-adding ways by comparison, considering the probability of privacy information's being broken by the attacker, the upper bound of privacy parameter 8 was determined, which effectively balanced the relationship between the privacy of sensitive information and the usability of data. The DP-OPTICS algorithm was compared with the differential privacy protection algorithm based on OPTICS clustering and DP-DBSCAN algorithm. The DP-OtrrlCS algorithm is between the other two in time consumption. However, in the case of having the same parameters, the stability of the DP-OPTICS algorithm is the best among them, so the improved OP-OPTICS differential privacy protection algorithm is generally feasible.
出处
《计算机应用》
CSCD
北大核心
2018年第1期73-78,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61462009)
广西民族大学科研基金资助项目(2014MDYB029)
广西民族大学研究生科研创新资助项目(gxun-chxps201671)
广西民族大学中国-东盟研究中心(广西科学实验中心)2014年度开放课题项目(TD201404)~~
关键词
聚类算法
个人隐私
重构误差
差分隐私保护
OPTICS算法
clustering algorithm
personal privacy
reconstruction error
differential privacy protection
Ordering PointsTo Identify Clustering Structure (OPTICS) algorithm