摘要
基于隐私保护的分类挖掘是近年来数据挖掘领域的热点之一,如何对原始真实数据进行变换,然后在变换后的数据集上构造判定树是研究的重点·基于转移概率矩阵提出了一个新颖的基于隐私保护的分类挖掘算法,可以适用于非字符型数据(布尔类型、分类类型和数字类型)和非均匀分布的原始数据,可以变换标签属性·实验表明该算法在变换后的数据集上构造的分类树具有较高的精度·
Privacy preserving classification mining is one of the fast-growing sub-areas of data mining. How to perturb original data and then build a decision tree based on perturbed data is the key research challenge. By applying transition probability matrix a novel privacy preserving classification mining algorithm is proposed, which suits non-char type data (Boolean, categorical, and numeric type) and non-uniform probability distribution of original data, and can perturb label attribute. Experimental results demonstrate that the decision tree built using this algorithm on perturbed data has a classifying accuracy comparable to that of the decision tree built using un-privacy-preserving algorithm on original data.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2006年第1期39-45,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目(69933010
60303008)
国家"八六三"高技术研究发展计划基金项目(2002AA4Z3430)
关键词
数据挖掘
分类
判定树
隐私保护
转移概率矩阵
data mining
classification
decision tree
privacy preserving
transition probability matrix