摘要
对文献[1]中影响保护隐私挖掘算法准确率的变换概率进行分析,并给出变换概率与挖掘算法准确性之间的关系式。计算表明,从10000个事务的数据集中,选取样本的数据量超过数据集10%所产生的关系式,其相对误差,即用该关系式算出变换概率代入基于随机响应技术的挖掘算法产生的误差与希望达到的误差相比,不超过6%;并通过计算表明关系式的相对误差,随数据集规模的增加而减少。因此,该挖掘算法能适用于实际问题的需要。
We analyze the transformation probability which affects the accuracy of privacy preserving data mining algorithm in , and give the expression between the transformation probability and accuracy of the algorithm. We also show that, under the condition that the number of transaction of data set is 10000 and the percentage of selected transactions is no less than 10%, the relative error of this expression, i.e. , error produced by the algorithm comparing with the expected error is no more than 6%. Through computations we demonstrate that with the size of data set increasing, the relative error of the expression decreases gradually. Hence, this algorithm can be used in practice.
出处
《微计算机应用》
2007年第7期696-700,共5页
Microcomputer Applications
基金
甘肃省自然科学基金3ZS051-A25-037资助
关键词
随机响应
关联规则
数据挖掘
变换概率
randomized response, association rule, data mining, transformation probability