摘要
在Map-Reduce的分布式环境框架下,基于微分隐私与主成分分析,并与熵、误分类增益、基尼指数等统计量相结合,提出了一种新的在分布式环境下的隐私保护特征选择算法,实现了在保护数据集隐私的同时保护特征的隐私.仿真实验结果表明,该算法具有较好的性能,能够在保护一定程度隐私信息的同时,有效地进行特征选择.
Privacy preserving and feature selection are very important in data mining. Thus, how to select feature effectively based on privacy preserving is also a hot topic. Under the Map-Reduce distributed enviromnent framework, pro-posed was the combination of the differential privacy and principal component analysis with the statistics including entropy, misclassification gain, and gini index, a new privacy preserving feature selection algorithm on distributed environ- ment. The algorithm achieved the purposes of protecting privacy of both data sets and features. The simulation results on several bench-mark data sets indicated that this algorithm performed well. During the selection of the important features, it could protect privacy information to a certain extent.
出处
《南京师范大学学报(工程技术版)》
CAS
2012年第3期60-67,共8页
Journal of Nanjing Normal University(Engineering and Technology Edition)
基金
国家自然科学基金(61073114)
关键词
隐私保护
特征选择
分布式
微分隐私
主成分分析
privacy preserving, feature selection, distribution, differential privacy, principal component analysis