The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are ca...The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.展开更多
Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks.Unfortunately,the minable data publication ...Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks.Unfortunately,the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities.It prohibits minable data publication since the published data may contain sensitive information.Thus,it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks.To this end,in this paper,we propose an optimized sanitization approach for minable data publication(named as SA-MDP).SA-MDP supports association rules mining function while providing privacy protection for specific rules.In SA-MDP,we consider the trade-off between the data utility and the data privacy in the minable data publication problem.To address this problem,SA-MDP designs a customized particle swarm optimization(PSO)algorithm,where the optimization objective is determined by both the data utility and the data privacy.Specifically,we take advantage of PSO to produce new particles,which is achieved by random mutation or learning from the best particle.Hence,SA-MDP can avoid the solutions being trapped into local optima.Besides,we design a proper fitness function to guide the particles to run towards the optimal solution.Additionally,we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate.Finally,the proposed SA-MDP approach is performed and verified over several datasets.The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.展开更多
The rapid progress and plummeting costs of human-genome sequencing enable the availability of large amount of personal biomedical information,leading to one of the most important concerns—genomic data privacy.Since p...The rapid progress and plummeting costs of human-genome sequencing enable the availability of large amount of personal biomedical information,leading to one of the most important concerns—genomic data privacy.Since personal biomedical data are highly correlated with relatives,with the increasing availability of genomes and personal traits online(i.e.,leakage unwittingly,or after their releasing intentionally to genetic service platforms),kin-genomic data privacy is threatened.We propose new inference attacks to predict unknown Single Nucleotide Polymorphisms(SNPs)and human traits of individuals in a familial genomic dataset based on probabilistic graphical models and belief propagation.With this method,the adversary can predict the unobserved genomes or traits of targeted individuals in a family genomic dataset where some individuals’genomes and traits are observed,relying on SNP-trait association from Genome-Wide Association Study(GWAS),Mendel’s Laws,and statistical relations between SNPs.Existing genome inferences have relatively high computational complexity with the input of tens of millions of SNPs and human traits.Then,we propose an approach to publish genomic data with differential privacy guarantee.After finding an approximate distribution of the input genomic dataset relying on Bayesian networks,a noisy distribution is obtained after injecting noise into the approximate distribution.Finally,synthetic genomic dataset is sampled and it is proved that any query on synthetic dataset satisfies differential privacy guarantee.展开更多
基金supported in part by the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(Grant No.2022C03174)the National Natural Science Foundation of China(No.92067103)+4 种基金the Key Research and Development Program of Shaanxi,China(No.2021ZDLGY06-02)the Natural Science Foundation of Shaanxi Province(No.2019ZDLGY12-02)the Shaanxi Innovation Team Project(No.2018TD-007)the Xi'an Science and technology Innovation Plan(No.201809168CX9JC10)the Fundamental Research Funds for the Central Universities(No.YJS2212)and National 111 Program of China B16037.
文摘The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.
基金This work was supported in part by the National Natural Science Foundation of China(No.61932006)in part by National Key R&D Program of China(No.2018AAA0100101)in part by Chongqing Technology Innovation and Application Development Project(No.cstc2020jscx-msxmX0156).
文摘Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks.Unfortunately,the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities.It prohibits minable data publication since the published data may contain sensitive information.Thus,it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks.To this end,in this paper,we propose an optimized sanitization approach for minable data publication(named as SA-MDP).SA-MDP supports association rules mining function while providing privacy protection for specific rules.In SA-MDP,we consider the trade-off between the data utility and the data privacy in the minable data publication problem.To address this problem,SA-MDP designs a customized particle swarm optimization(PSO)algorithm,where the optimization objective is determined by both the data utility and the data privacy.Specifically,we take advantage of PSO to produce new particles,which is achieved by random mutation or learning from the best particle.Hence,SA-MDP can avoid the solutions being trapped into local optima.Besides,we design a proper fitness function to guide the particles to run towards the optimal solution.Additionally,we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate.Finally,the proposed SA-MDP approach is performed and verified over several datasets.The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.
文摘The rapid progress and plummeting costs of human-genome sequencing enable the availability of large amount of personal biomedical information,leading to one of the most important concerns—genomic data privacy.Since personal biomedical data are highly correlated with relatives,with the increasing availability of genomes and personal traits online(i.e.,leakage unwittingly,or after their releasing intentionally to genetic service platforms),kin-genomic data privacy is threatened.We propose new inference attacks to predict unknown Single Nucleotide Polymorphisms(SNPs)and human traits of individuals in a familial genomic dataset based on probabilistic graphical models and belief propagation.With this method,the adversary can predict the unobserved genomes or traits of targeted individuals in a family genomic dataset where some individuals’genomes and traits are observed,relying on SNP-trait association from Genome-Wide Association Study(GWAS),Mendel’s Laws,and statistical relations between SNPs.Existing genome inferences have relatively high computational complexity with the input of tens of millions of SNPs and human traits.Then,we propose an approach to publish genomic data with differential privacy guarantee.After finding an approximate distribution of the input genomic dataset relying on Bayesian networks,a noisy distribution is obtained after injecting noise into the approximate distribution.Finally,synthetic genomic dataset is sampled and it is proved that any query on synthetic dataset satisfies differential privacy guarantee.