Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers ...Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.展开更多
基金partly supported by the National Natural Science Foundation of China (Nos. 61632010 and 61602129)
文摘Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.