摘要
隐私保护是当前数据挖掘领域中一个十分重要的研究问题,其目标是要在不精确访问真实原始数据的条件下,得到准确的模型和分析结果.为了提高对隐私数据的保护程度和挖掘结果的准确性,提出一种有效的隐私保护关联规则挖掘方法.首先将数据干扰和查询限制这两种隐私保护的基本策略相结合,提出了一种新的数据随机处理方法,即部分隐藏的随机化回答(randomizedresponsewithpartialhiding,简称RRPH)方法,以对原始数据进行变换和隐藏.然后以此为基础,针对经过RRPH方法处理后的数据,给出了一种简单而又高效的频繁项集生成算法,进而实现了隐私保护的关联规则挖掘.理论分析和实验结果均表明,基于RRPH的隐私保护关联规则挖掘方法具有很好的隐私性、准确性、高效性和适用性.
Privacy preservation is one of the most important topics in data mining. The purpose is to discover accurate patterns without precise access to the original data. In order to improve the privacy preservation and mining accuracy, an effective method for privacy preserving association rule mining is presented in this paper. First, a new data preprocessing approach, Randomized Response with Partial Hiding (RRPH) is proposed. In this approach, the two privacy preserving strategies, data perturbation and query restriction, are combined to transform and hide the original data. Then, a privacy preserving association rule mining algorithm based on RRPH is presented. As shown in the theoretical analysis and the experimental results, privacy preserving association rule mining based on RRPH can achieve significant improvements in terms of privacy, accuracy, efficiency, and applicability.
出处
《软件学报》
EI
CSCD
北大核心
2006年第8期1764-1774,共11页
Journal of Software
基金
国家自然科学基金~~
关键词
隐私保护
数据挖掘
关联规则
频繁项集
随机化回答
privacy preservation
data mining
association rule
frequent itemset
randomized response