摘要
基于分类树划分的差分隐私方法能有效地对集值型数据的发布进行保护,但在构造分类树时该方法没有充分利用集值型数据集自身的特征。通过对添加噪声量的影响因素分析,提出了一种基于数据集特征的集值型数据发布方法,该方法首先对数据集进行分析,然后根据数据集中记录的种类数占总输出域的比例以及只出现一次的记录种类数占总输出域比例,动态构造分类树。实验结果表明:当数据集满足IOR≤40%且SIOR=(5%,20%]时,通过有效利用集值型数据集的特征,构造较优的分类树,可以添加少于10%的噪声。
Taxonomy tree partitioning based method for differential privacy could protect the effective releasing of set-valued data. However, taxonomy tree does not take the characteristics of set-valued datasets into consideration of tree construction. By analyzing the influence factors of added noise, this paper proposed a novel method that releases set-valued data based on the characteristics of datasets. This method firstly analyzed the datasets, and then dynamically formed taxonomy tree structure according to the types of records in the dataset and the proportion between the total output of a single record field and the total number of species appeared in proportional output fields. The experimental results show that the proposed method can effectively utilize the characteristics of set-valued datasets, when the datasets conditions satisfy IOR ≤40% and SIOR = (5% ,20% ], constructing superior taxonomy tree and reducing noise to less than 10%.
出处
《计算机应用研究》
CSCD
北大核心
2015年第8期2420-2424,2436,共6页
Application Research of Computers
基金
江西省教育厅科学技术研究项目(GJJ13415)
江西理工大学科研基金重点课题(NSFJ2014-K11)
关键词
分类树
差分隐私保护
集值型数据
数据集特征
taxonomy tree
differential privacy
set-valued data
datasets characteristics