期刊文献+

基于属性分割的高维二值数据差分隐私发布 被引量:5

Differentially Private High-Dimensional Binary Data Publication via Attribute Segmentation
下载PDF
导出
摘要 通常随着数据集属性维度的增加,高维数据的差分隐私发布方法所需的时间成本和产生的噪声干扰也会随之增大,尤其是对于高维二值数据很容易被过大的噪声所覆盖.因此,针对高维二值数据的隐私发布问题,提出了一种高效且低噪的发布方法PrivSCBN(differentially private spectral clustering Bayesian network).首先,该方法基于Jaccard距离,使用满足差分隐私的谱聚类算法来划分属性集,然后根据划分的结果来进一步分割原始数据集,从而实现数据的降维.其次,该方法基于动态规划思想并结合指数机制,使用满足差分隐私的贝叶斯网络快速构建算法来为每个分割后的子集构建贝叶斯网络.最后,该方法利用条件概率在二值数据上的取值特点,对从贝叶斯网络中提取的条件分布进行加噪,并通过控制贝叶斯网络的最大入度数来减少其产生的噪声大小.通过在3个真实高维二值数据集上的实验,验证了PrivSCBN方法的高效性与可用性. Generally,as the attribute dimension of the data set increases,the time cost and noise interference generated by the differential privacy publishing method of high-dimensional data will also increase.Especially for high-dimensional binary data,it is easy to be covered by excessive noise.Therefore,an efficient and low-noise publishing method PrivSCBN(differentially private spectral clustering Bayesian network)is proposed for the issue of privacy publishing of high-dimensional binary data.Firstly,based on Jaccard distance,this method uses a spectral clustering algorithm which satisfies differential privacy to divide the attributes set,and further segments the original data set,so as to achieve dimension reduction.Secondly,based on the idea of dynamic programming and combined with the exponential mechanism,this method uses a fast building Bayesian network algorithm which satisfies differential privacy to construct Bayesian network for each subset after segmentation.Finally,this method uses the value characteristic of conditional probability on binary data to add noise to conditional distribution extracted from Bayesian network,and reduces the noise by controlling the maximum in-degrees of Bayesian network.The efficiency and availability of the PrivSCBN method are verified by experiments on three real high-dimensional binary data sets.
作者 洪金鑫 吴英杰 蔡剑平 孙岚 Hong Jinxin;Wu Yingjie;Cai Jianping;Sun Lan(College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350108;College of Information and Smart Electromechanical Engineering,Xiamen Huaxia University,Xiamen,Fujian 361024)
出处 《计算机研究与发展》 EI CSCD 北大核心 2022年第1期182-196,共15页 Journal of Computer Research and Development
基金 福建省自然科学基金项目(2017J01754,2018J01797)。
关键词 差分隐私 高维二值数据发布 贝叶斯网络 属性划分 动态规划 条件分布 differential privacy high-dimensional binary data publication Bayesian network attribute division dynamic programming conditional distribution
  • 相关文献

参考文献6

二级参考文献14

  • 1Dwork (2. Differential privacy[C]//Proeeedings of the 33rd in- ternational conference on Automata, Languages and Program- ming-Volume Part II. Springer-Verlag, 2006 : 1-12.
  • 2Xu J, Zhang Z, Xiao X, et al. Differentially private histogram publication[J]. The VLDB Journal-The International Journal on Very Large Data Bases, 2013,22 (6) : 797-822.
  • 3Blum A, Ligett K, Roth A. A learning theory approach to non- interactive database privacy[C]//STOC'08. 2008:609-618.
  • 4McSherry F,Talwar K. Mechanism design via differential priva- cy[C]//48th Annual IEEE Symposium on Foundations of Com- puter Science, 2007. FOCS' 07. IEEE, 2007 : 94-103.
  • 5Li Hang. Statistics learning method[M]. Beijing: Tsinghua uni- versity press, 2012 : 47-52.
  • 6Mohammed N, Chen R, Fung B, et al. Differentially private data release for data mining[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011 : 493-501.
  • 7Zhang J, Cormode G, Procopiuc C M, et al. PrivBayes: Private Data Release via Bayesian Networks[C/OL]. [2014-7-8].
  • 8ht- tp://dimacs, rutgers, edu/ graham/pubs/papers/PrivBayes. pdf.
  • 9Bache K, Lichman M. UCI Machine Learning Repository[DB/ OL]. [2014-7-8]. http://archive, ics. uci. edu/ml.
  • 10周水庚,李丰,陶宇飞,肖小奎.面向数据库应用的隐私保护研究综述[J].计算机学报,2009,32(5):847-861. 被引量:221

共引文献158

同被引文献49

引证文献5

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部