Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (...Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (e.g., user behavior records, electronic health records), directly releas- ing the discovered frequent patterns with support counts will carry significant risk to the privacy of individuals. In this pa- per, we study the problem of how to accurately find the top-k frequent patterns with noisy support counts on transactional databases while satisfying differential privacy. We propose an algorithm, called differentially private frequent pattern (DFP- Growth), that integrates a Laplace mechanism and an expo- nential mechanism to avoid privacy leakage. We theoretically prove that the proposed method is (λ, δ)-useful and differ- entially private. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct constrained inference in the post-processing step. Extensive experiments, using several real datasets, confirm that our algorithm generates highly accurate noisy support counts and top-k frequent patterns.展开更多
基金This research was partially supported by the National Natural Science Foundation of China (Grant Nos. 61379050, 91224008), the National 863 High-tech Program (2013AA013204), Specialized Research Fund for the Doctoral Program of Higher Education(20130004130001)
文摘Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (e.g., user behavior records, electronic health records), directly releas- ing the discovered frequent patterns with support counts will carry significant risk to the privacy of individuals. In this pa- per, we study the problem of how to accurately find the top-k frequent patterns with noisy support counts on transactional databases while satisfying differential privacy. We propose an algorithm, called differentially private frequent pattern (DFP- Growth), that integrates a Laplace mechanism and an expo- nential mechanism to avoid privacy leakage. We theoretically prove that the proposed method is (λ, δ)-useful and differ- entially private. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct constrained inference in the post-processing step. Extensive experiments, using several real datasets, confirm that our algorithm generates highly accurate noisy support counts and top-k frequent patterns.