摘要
针对已有差分隐私高维数据发布方法无法有效兼顾数据间复杂属性的关联关系和计算成本的问题,提出一种基于聚类分析技术的差分隐私高维数据发布方法PrivBC。首先,基于K-means++设计属性聚类方法,引入最大信息系数量化属性间的关联关系,并对具有高度关联关系的数据属性进行聚类。其次,对聚类产生的各个数据子集进行如下操作:计算关系矩阵以缩减属性对的候选空间,并构建满足差分隐私的贝叶斯网络。最后,根据贝叶斯网络采样每个属性,并合成新的隐私数据集进行发布。与PrivBayes方法相比,PrivBC方法的误分类率和运行时间分别平均降低了12.6%和30.2%。实验结果表明,所提方法在有效保证数据可用性的基础上,可以显著提高计算效率,为高维数据的隐私发布提供了新思路。
Aiming at the problem that the existing differential privacy high-dimensional data publishing methods are difficult to take into account both the complex attribute correlation between data and computational cost,a differential privacy high-dimensional data publishing method based on clustering analysis technology,namely PrivBC,was proposed.Firstly,the attribute clustering method was designed based on the K-means++,the maximum information coefficient was introduced to quantify the correlation between the attributes,and the data attributes with high correlation were clustered.Secondly,for each data subset obtained by the clustering,the correlation matrix was calculated to reduce the candidate space of attribute pairs,and the Bayesian network satisfying differential privacy was constructed.Finally,each attribute was sampled according to the Bayesian networks,and a new private dataset was synthesized for publishing.Compared with PrivBayes method,PrivBC method had the misclassification rate and running time reduced by 12.6%and 30.2%averagely and respectively.Experimental results show that the proposed method can significantly improve the computational efficiency with ensuring the data availability,and provides a new idea for the private publishing of high-dimensional big data.
作者
陈恒恒
倪志伟
朱旭辉
金媛媛
陈千
CHEN Hengheng;NI Zhiwei;ZHU Xuhui;JIN Yuanyuan;CHEN Qian(School of Management,Hefei University of Technology,Hefei Anhui 230009,China;Key Laboratory of Process Optimization and Intelligent Decision-making,Ministry of Education(Hefei University of Technology),Hefei Anhui 230009,China)
出处
《计算机应用》
CSCD
北大核心
2021年第9期2578-2585,共8页
journal of Computer Applications
基金
国家自然科学基金资助项目(91546108,61806068)
安徽省科技重大专项(201903a05020020)
安徽省自然科学基金资助项目(1908085QG298)。
关键词
差分隐私
高维数据
属性聚类
贝叶斯网络
数据发布
differential privacy
high-dimensional data
attribute clustering
Bayesian network
data publishing