摘要
K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two conflicting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.
K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two conflicting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.