摘要
主要研究在对不完全数据集进行决策树分析时,如何加入差分隐私保护技术。首先简单介绍了差分隐私ID3算法和差分隐私随机森林决策树算法;然后针对上述算法存在的缺陷和不足进行了修改,提出指数机制的差分隐私随机森林决策树算法;最后对于不完全数据集提出了一种新的WP(Weight Partition)缺失值处理方法,能够在不需要插值的情况下,使决策树分析算法既能满足差分隐私保护,也能拥有更高的预测准确率和适应性。实验证明,无论是Laplace机制还是指数机制,无论是ID3算法还是随机森林决策树算法,都能适用于所提方法。
We mainly studied the problem of constructing differential privacy decision tree classifier with incomplete data sets.We first introduced the differential privacy ID3 decision tree algorithm and differentially private random decision tree algorithm.Then we considered the weakness of the algorithms talked above,and created a new differentially private random decision tree algorithm with exponential mechanism.Finally,an approach for decision tree classifier with incomplete data sets was proposed,which yields better prediction while maintaining good privacy without inserting values,called WP(Weight Partition).And the experimental results show that our approach is suitable for either differential privacy ID3 decision trees or differentially private random decision trees,either laplace or exponential mechanism.
出处
《计算机科学》
CSCD
北大核心
2017年第6期139-143,149,共6页
Computer Science
关键词
差分隐私保护
不完全数据集
ID3算法
随机森林决策树
Differential privacy
Incomplete data sets
ID3 decision tree algorithm
Random decision tree algorithm