期刊文献+

基于聚类分析的差分隐私高维数据发布方法 被引量:7

Differential privacy high-dimensional data publishing method via clustering analysis
下载PDF
导出
摘要 针对已有差分隐私高维数据发布方法无法有效兼顾数据间复杂属性的关联关系和计算成本的问题,提出一种基于聚类分析技术的差分隐私高维数据发布方法PrivBC。首先,基于K-means++设计属性聚类方法,引入最大信息系数量化属性间的关联关系,并对具有高度关联关系的数据属性进行聚类。其次,对聚类产生的各个数据子集进行如下操作:计算关系矩阵以缩减属性对的候选空间,并构建满足差分隐私的贝叶斯网络。最后,根据贝叶斯网络采样每个属性,并合成新的隐私数据集进行发布。与PrivBayes方法相比,PrivBC方法的误分类率和运行时间分别平均降低了12.6%和30.2%。实验结果表明,所提方法在有效保证数据可用性的基础上,可以显著提高计算效率,为高维数据的隐私发布提供了新思路。 Aiming at the problem that the existing differential privacy high-dimensional data publishing methods are difficult to take into account both the complex attribute correlation between data and computational cost,a differential privacy high-dimensional data publishing method based on clustering analysis technology,namely PrivBC,was proposed.Firstly,the attribute clustering method was designed based on the K-means++,the maximum information coefficient was introduced to quantify the correlation between the attributes,and the data attributes with high correlation were clustered.Secondly,for each data subset obtained by the clustering,the correlation matrix was calculated to reduce the candidate space of attribute pairs,and the Bayesian network satisfying differential privacy was constructed.Finally,each attribute was sampled according to the Bayesian networks,and a new private dataset was synthesized for publishing.Compared with PrivBayes method,PrivBC method had the misclassification rate and running time reduced by 12.6%and 30.2%averagely and respectively.Experimental results show that the proposed method can significantly improve the computational efficiency with ensuring the data availability,and provides a new idea for the private publishing of high-dimensional big data.
作者 陈恒恒 倪志伟 朱旭辉 金媛媛 陈千 CHEN Hengheng;NI Zhiwei;ZHU Xuhui;JIN Yuanyuan;CHEN Qian(School of Management,Hefei University of Technology,Hefei Anhui 230009,China;Key Laboratory of Process Optimization and Intelligent Decision-making,Ministry of Education(Hefei University of Technology),Hefei Anhui 230009,China)
出处 《计算机应用》 CSCD 北大核心 2021年第9期2578-2585,共8页 journal of Computer Applications
基金 国家自然科学基金资助项目(91546108,61806068) 安徽省科技重大专项(201903a05020020) 安徽省自然科学基金资助项目(1908085QG298)。
关键词 差分隐私 高维数据 属性聚类 贝叶斯网络 数据发布 differential privacy high-dimensional data attribute clustering Bayesian network data publishing
  • 相关文献

参考文献10

二级参考文献30

  • 1胡士强,敬忠良.粒子滤波算法综述[J].控制与决策,2005,20(4):361-365. 被引量:293
  • 2范典华.粒子滤波[J].中山大学研究生学刊(自然科学与医学版),2005,26(2):22-32. 被引量:11
  • 3DOUCET A,GODSILL S,ANDRIEU C.On sequential monte carlo sampling methods for Bayesian filtering[J].Statistics and Computing,2000,10(3):197-208.
  • 4GORDON N J,SALMOND D J,SMITH A F M.Novel approach to nonlinear and non-Gaussian Bayesian state estimation[J].IEEE Procee-dings on Redar and Signal Processing,1993,140(2):107-113.
  • 5DOUCET A,De FREITAS N,MURPHY K P,et al.Rao-Blackwellised particle filtering for dynamic Bayesian networks[C]//Proc of the 16th Conference on Uncertainty in Artificial Intelligence.2000:176-183.
  • 6FOX D.KLD-sampling:adaptive particle filters[C]//Proc of Advances in Neural Information Processing Systems 14(NIPS).2001.
  • 7FOX D.Adapting the sample size in particle filters through KLD-sampling[J].The International Journal of Robotics Research,2003,22(12):985-1003.
  • 8MACCORMICK J,ISARD M.Partitioned sampling,articulated object,and interface-quality hand tracking[C]//Proc of European Conference on Computer Vision.Dublin:[s.n.],2000.
  • 9KUENZER A,SCHLIC C,OHMANN F,et al.An empirical study of dynamic Bayesian networks for user modeling[C]//Proc of UM2001 Workshop on Machine Learning for User Modeling.2001:1-10.
  • 10MIHAJLOVIC V,PETKOVIC M.Dynamic Bayesian networks:a state of the art[R].[S.l.]:University of Twente Document Repository,2001.

共引文献327

同被引文献65

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部