From fraud detection to speech recognition,including price prediction,Machine Learning(ML)applications are manifold and can significantly improve different areas.Nevertheless,machine learning models are vulnerable and...From fraud detection to speech recognition,including price prediction,Machine Learning(ML)applications are manifold and can significantly improve different areas.Nevertheless,machine learning models are vulnerable and are exposed to different security and privacy attacks.Hence,these issues should be addressed while using ML models to preserve the security and privacy of the data used.There is a need to secure ML models,especially in the training phase to preserve the privacy of the training datasets and to minimise the information leakage.In this paper,we present an overview of ML threats and vulnerabilities,and we highlight current progress in the research works proposing defence techniques againstML security and privacy attacks.The relevant background for the different attacks occurring in both the training and testing/inferring phases is introduced before presenting a detailed overview of Membership Inference Attacks(MIA)and the related countermeasures.In this paper,we introduce a countermeasure against membership inference attacks(MIA)on Conventional Neural Networks(CNN)based on dropout and L2 regularization.Through experimental analysis,we demonstrate that this defence technique can mitigate the risks of MIA attacks while ensuring an acceptable accuracy of the model.Indeed,using CNN model training on two datasets CIFAR-10 and CIFAR-100,we empirically verify the ability of our defence strategy to decrease the impact of MIA on our model and we compare results of five different classifiers.Moreover,we present a solution to achieve a trade-off between the performance of themodel and the mitigation of MIA attack.展开更多
Deep learning can train models from a dataset to solve tasks. Although deep learning has attracted much interest owing to the excellent performance, security issues are gradually exposed. Deep learning may be prone to...Deep learning can train models from a dataset to solve tasks. Although deep learning has attracted much interest owing to the excellent performance, security issues are gradually exposed. Deep learning may be prone to the membership inference attack, where the attacker can determine the membership of a given sample. In this paper, we propose a new defense mechanism against membership inference: NoiseDA. In our proposal, a model is not directly trained on a sensitive dataset to alleviate the threat of membership inference attack by leveraging domain adaptation. Besides, a module called Feature Crafter has been designed to reduce the necessary training dataset from 2 to 1, which creates features for domain adaptation training using noise addictive mechanisms. Our experiments have shown that, with the noises properly added by Feature Crafter, our proposal can reduce the success of membership inference with a controllable utility loss.展开更多
Nowadays,machine learning(ML)algorithms cannot succeed without the availability of an enormous amount of training data.The data could contain sensitive information,which needs to be protected.Membership inference atta...Nowadays,machine learning(ML)algorithms cannot succeed without the availability of an enormous amount of training data.The data could contain sensitive information,which needs to be protected.Membership inference attacks attempt to find out whether a target data point is used to train a certain ML model,which results in security and privacy implications.The leakage of membership information can vary from one machine-learning algorithm to another.In this paper,we conduct an empirical study to explore the performance of membership inference attacks against three different machine learning algorithms,namely,K-nearest neighbors,random forest,support vector machine,and logistic regression using three datasets.Our experiments revealed the best machine learning model that can be more immune to privacy attacks.Additionally,we examined the effects of such attacks when varying the dataset size.Based on our observations for the experimental results,we propose a defense mechanism that is less prone to privacy attacks and demonstrate its effectiveness through an empirical evaluation.展开更多
Membership inference attacks on machine learning models have drawn significant attention.While current research primarily utilizes shadow modeling techniques,which require knowledge of the target model and training da...Membership inference attacks on machine learning models have drawn significant attention.While current research primarily utilizes shadow modeling techniques,which require knowledge of the target model and training data,practical scenarios involve black-box access to the target model with no available information.Limited training data further complicate the implementation of these attacks.In this paper,we experimentally compare common data enhancement schemes and propose a data synthesis framework based on the variational autoencoder generative adversarial network(VAE-GAN)to extend the training data for shadow models.Meanwhile,this paper proposes a shadow model training algorithm based on adversarial training to improve the shadow model's ability to mimic the predicted behavior of the target model when the target model's information is unknown.By conducting attack experiments on different models under the black-box access setting,this paper verifies the effectiveness of the VAE-GAN-based data synthesis framework for improving the accuracy of membership inference attack.Furthermore,we verify that the shadow model,trained by using the adversarial training approach,effectively improves the degree of mimicking the predicted behavior of the target model.Compared with existing research methods,the method proposed in this paper achieves a 2%improvement in attack accuracy and delivers better attack performance.展开更多
Membership inference(MI)attacks mainly aim to infer whether a data record was used to train a target model or not.Due to the serious privacy risks,MI attacks have been attracting a tremendous amount of attention in th...Membership inference(MI)attacks mainly aim to infer whether a data record was used to train a target model or not.Due to the serious privacy risks,MI attacks have been attracting a tremendous amount of attention in the research community.One existing work conducted-to our best knowledge the first dedicated survey study in this specific area:The survey provides a comprehensive review of the literature during the period of 2017~2021(e.g.,over 100 papers).However,due to the tremendous amount of progress(i.e.,176 papers)made in this area since 2021,the survey conducted by the one existing work has unfortunately already become very limited in the following two aspects:(1)Although the entire literature from 2017~2021 covers 18 ways to categorize(all the proposed)MI attacks,the literature during the period of 2017~2021,which was reviewed in the one existing work,only covered 5 ways to categorize MI attacks.With 13 ways missing,the survey conducted by the one existing work only covers 27%of the landscape(in terms of how to categorize MI attacks)if a retrospective view is taken.(2)Since the literature during the period of 2017~2021 only covers 27%of the landscape(in terms of how to categorize),the number of new insights(i.e.,why an MI attack could succeed)behind all the proposed MI attacks has been significantly increasing since year 2021.As a result,although none of the previous work has made the insights as a main focus of their studies,we found that the various insights leveraged in the literature can be broken down into 10 groups.Without making the insights as a main focus,a survey study could fail to help researchers gain adequate intellectual depth in this area of research.In this work,we conduct a systematic study to address these limitations.In particular,in order to address the first limitation,we make the 13 newly emerged ways to categorize MI attacks as a main focus on the study.In order to address the second limitation,we provide-to our best knowledge-the first review of the various insights leveraged in the entire literature.We found that the various insights leveraged in the literature can be broken down into 10 groups.Moreover,our survey also provides a comprehensive review of the existing defenses against MI attacks,the existing applications of MI attacks,the widely used datasets(e.g.,107 new datasets),and the eva luation metrics(e.g.,20 new evaluation metrics).展开更多
The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Infor...The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.展开更多
The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Infor...The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. This inadvertent leakage of sensitive information typically occurs when the models are subjected to black-box attacks. To address the growing concerns of safeguarding private and sensitive information while simultaneously preserving its utility, we analyze the performance of Targeted Catastrophic Forgetting (TCF). TCF involves preserving targeted pieces of sensitive information within datasets through an iterative pipeline which significantly reduces the likelihood of such information being leaked or reproduced by the model during black-box attacks, such as the autocompletion attack in our case. The experiments conducted using TCF evidently demonstrate its capability to reduce the extraction of PII while still preserving the context and utility of the target application.展开更多
In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protectio...In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.展开更多
Empirical attacks on Federated Learning(FL)systems indicate that FL is fraught with numerous attack surfaces throughout the FL execution.These attacks can not only cause models to fail in specific tasks,but also infer...Empirical attacks on Federated Learning(FL)systems indicate that FL is fraught with numerous attack surfaces throughout the FL execution.These attacks can not only cause models to fail in specific tasks,but also infer private information.While previous surveys have identified the risks,listed the attack methods available in the literature or provided a basic taxonomy to classify them,they mainly focused on the risks in the training phase of FL.In this work,we survey the threats,attacks and defenses to FL throughout the whole process of FL in three phases,including Data and Behavior Auditing Phase,Training Phase and Predicting Phase.We further provide a comprehensive analysis of these threats,attacks and defenses,and summarize their issues and taxonomy.Our work considers security and privacy of FL based on the viewpoint of the execution process of FL.We highlight that establishing a trusted FL requires adequate measures to mitigate security and privacy threats at each phase.Finally,we discuss the limitations of current attacks and defense approaches and provide an outlook on promising future research directions in FL.展开更多
Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee....Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms.展开更多
Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the...Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of privacy leakage.Specifically,word count statistics and updates of latent topics in CGS,which are essential for parameter estimation,could be employed by adversaries to conduct effective membership inference attacks(MIAs).Till now,there are two kinds of methods exploited in CGS to defend against MIAs:adding noise to word count statistics and utilizing inherent privacy.These two kinds of methods have their respective limitations.Noise sampled from the Laplacian distribution sometimes produces negative word count statistics,which render terrible parameter estimation in CGS.Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs.It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy.The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived.It is the first time that R′enyi differential privacy(RDP)has been introduced into CGS and we propose RDP-LDA,an effective framework for analyzing the privacy loss of any differentially private CGS.RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained byε-DP.In RDP-LDA,we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative.And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy.Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs.展开更多
文摘From fraud detection to speech recognition,including price prediction,Machine Learning(ML)applications are manifold and can significantly improve different areas.Nevertheless,machine learning models are vulnerable and are exposed to different security and privacy attacks.Hence,these issues should be addressed while using ML models to preserve the security and privacy of the data used.There is a need to secure ML models,especially in the training phase to preserve the privacy of the training datasets and to minimise the information leakage.In this paper,we present an overview of ML threats and vulnerabilities,and we highlight current progress in the research works proposing defence techniques againstML security and privacy attacks.The relevant background for the different attacks occurring in both the training and testing/inferring phases is introduced before presenting a detailed overview of Membership Inference Attacks(MIA)and the related countermeasures.In this paper,we introduce a countermeasure against membership inference attacks(MIA)on Conventional Neural Networks(CNN)based on dropout and L2 regularization.Through experimental analysis,we demonstrate that this defence technique can mitigate the risks of MIA attacks while ensuring an acceptable accuracy of the model.Indeed,using CNN model training on two datasets CIFAR-10 and CIFAR-100,we empirically verify the ability of our defence strategy to decrease the impact of MIA on our model and we compare results of five different classifiers.Moreover,we present a solution to achieve a trade-off between the performance of themodel and the mitigation of MIA attack.
文摘Deep learning can train models from a dataset to solve tasks. Although deep learning has attracted much interest owing to the excellent performance, security issues are gradually exposed. Deep learning may be prone to the membership inference attack, where the attacker can determine the membership of a given sample. In this paper, we propose a new defense mechanism against membership inference: NoiseDA. In our proposal, a model is not directly trained on a sensitive dataset to alleviate the threat of membership inference attack by leveraging domain adaptation. Besides, a module called Feature Crafter has been designed to reduce the necessary training dataset from 2 to 1, which creates features for domain adaptation training using noise addictive mechanisms. Our experiments have shown that, with the noises properly added by Feature Crafter, our proposal can reduce the success of membership inference with a controllable utility loss.
文摘Nowadays,machine learning(ML)algorithms cannot succeed without the availability of an enormous amount of training data.The data could contain sensitive information,which needs to be protected.Membership inference attacks attempt to find out whether a target data point is used to train a certain ML model,which results in security and privacy implications.The leakage of membership information can vary from one machine-learning algorithm to another.In this paper,we conduct an empirical study to explore the performance of membership inference attacks against three different machine learning algorithms,namely,K-nearest neighbors,random forest,support vector machine,and logistic regression using three datasets.Our experiments revealed the best machine learning model that can be more immune to privacy attacks.Additionally,we examined the effects of such attacks when varying the dataset size.Based on our observations for the experimental results,we propose a defense mechanism that is less prone to privacy attacks and demonstrate its effectiveness through an empirical evaluation.
文摘Membership inference attacks on machine learning models have drawn significant attention.While current research primarily utilizes shadow modeling techniques,which require knowledge of the target model and training data,practical scenarios involve black-box access to the target model with no available information.Limited training data further complicate the implementation of these attacks.In this paper,we experimentally compare common data enhancement schemes and propose a data synthesis framework based on the variational autoencoder generative adversarial network(VAE-GAN)to extend the training data for shadow models.Meanwhile,this paper proposes a shadow model training algorithm based on adversarial training to improve the shadow model's ability to mimic the predicted behavior of the target model when the target model's information is unknown.By conducting attack experiments on different models under the black-box access setting,this paper verifies the effectiveness of the VAE-GAN-based data synthesis framework for improving the accuracy of membership inference attack.Furthermore,we verify that the shadow model,trained by using the adversarial training approach,effectively improves the degree of mimicking the predicted behavior of the target model.Compared with existing research methods,the method proposed in this paper achieves a 2%improvement in attack accuracy and delivers better attack performance.
基金supported by National Natural Science Foundation of China(61941105,61772406,and U2336203)National Key Research and Development Program of China(2023QY1202)Beijing Natural Science Foundation(4242031).
文摘Membership inference(MI)attacks mainly aim to infer whether a data record was used to train a target model or not.Due to the serious privacy risks,MI attacks have been attracting a tremendous amount of attention in the research community.One existing work conducted-to our best knowledge the first dedicated survey study in this specific area:The survey provides a comprehensive review of the literature during the period of 2017~2021(e.g.,over 100 papers).However,due to the tremendous amount of progress(i.e.,176 papers)made in this area since 2021,the survey conducted by the one existing work has unfortunately already become very limited in the following two aspects:(1)Although the entire literature from 2017~2021 covers 18 ways to categorize(all the proposed)MI attacks,the literature during the period of 2017~2021,which was reviewed in the one existing work,only covered 5 ways to categorize MI attacks.With 13 ways missing,the survey conducted by the one existing work only covers 27%of the landscape(in terms of how to categorize MI attacks)if a retrospective view is taken.(2)Since the literature during the period of 2017~2021 only covers 27%of the landscape(in terms of how to categorize),the number of new insights(i.e.,why an MI attack could succeed)behind all the proposed MI attacks has been significantly increasing since year 2021.As a result,although none of the previous work has made the insights as a main focus of their studies,we found that the various insights leveraged in the literature can be broken down into 10 groups.Without making the insights as a main focus,a survey study could fail to help researchers gain adequate intellectual depth in this area of research.In this work,we conduct a systematic study to address these limitations.In particular,in order to address the first limitation,we make the 13 newly emerged ways to categorize MI attacks as a main focus on the study.In order to address the second limitation,we provide-to our best knowledge-the first review of the various insights leveraged in the entire literature.We found that the various insights leveraged in the literature can be broken down into 10 groups.Moreover,our survey also provides a comprehensive review of the existing defenses against MI attacks,the existing applications of MI attacks,the widely used datasets(e.g.,107 new datasets),and the eva luation metrics(e.g.,20 new evaluation metrics).
文摘The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.
文摘The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. This inadvertent leakage of sensitive information typically occurs when the models are subjected to black-box attacks. To address the growing concerns of safeguarding private and sensitive information while simultaneously preserving its utility, we analyze the performance of Targeted Catastrophic Forgetting (TCF). TCF involves preserving targeted pieces of sensitive information within datasets through an iterative pipeline which significantly reduces the likelihood of such information being leaked or reproduced by the model during black-box attacks, such as the autocompletion attack in our case. The experiments conducted using TCF evidently demonstrate its capability to reduce the extraction of PII while still preserving the context and utility of the target application.
基金This work is supported by Major Scientific and Technological Special Project of Guizhou Province(20183001)Research on the education mode for complicate skill students in new media with cross specialty integration(22150117092)+2 种基金Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ014)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ019)Open Foundation of Guizhou Provincial Key Laboratory of Public Big Data(2018BDKFJJ022).
文摘In recent years,with the continuous advancement of the intelligent process of the Internet of Vehicles(IoV),the problem of privacy leakage in IoV has become increasingly prominent.The research on the privacy protection of the IoV has become the focus of the society.This paper analyzes the advantages and disadvantages of the existing location privacy protection system structure and algorithms,proposes a privacy protection system structure based on untrusted data collection server,and designs a vehicle location acquisition algorithm based on a local differential privacy and game model.The algorithm first meshes the road network space.Then,the dynamic game model is introduced into the game user location privacy protection model and the attacker location semantic inference model,thereby minimizing the possibility of exposing the regional semantic privacy of the k-location set while maximizing the availability of the service.On this basis,a statistical method is designed,which satisfies the local differential privacy of k-location sets and obtains unbiased estimation of traffic density in different regions.Finally,this paper verifies the algorithm based on the data set of mobile vehicles in Shanghai.The experimental results show that the algorithm can guarantee the user’s location privacy and location semantic privacy while satisfying the service quality requirements,and provide better privacy protection and service for the users of the IoV.
基金This work was supported in part by National Key R&D Program of China,under Grant 2020YFB2103802in part by the National Natural Science Foundation of China,uder grant U21A20463in part by the Fundamental Research Funds for the Central Universities of China under Grant KKJB320001536.
文摘Empirical attacks on Federated Learning(FL)systems indicate that FL is fraught with numerous attack surfaces throughout the FL execution.These attacks can not only cause models to fail in specific tasks,but also infer private information.While previous surveys have identified the risks,listed the attack methods available in the literature or provided a basic taxonomy to classify them,they mainly focused on the risks in the training phase of FL.In this work,we survey the threats,attacks and defenses to FL throughout the whole process of FL in three phases,including Data and Behavior Auditing Phase,Training Phase and Predicting Phase.We further provide a comprehensive analysis of these threats,attacks and defenses,and summarize their issues and taxonomy.Our work considers security and privacy of FL based on the viewpoint of the execution process of FL.We highlight that establishing a trusted FL requires adequate measures to mitigate security and privacy threats at each phase.Finally,we discuss the limitations of current attacks and defense approaches and provide an outlook on promising future research directions in FL.
基金Supported by the Program for New Century Excellent Talents in Universities (Grant No. NCET-06-0290)the National Natural Science Foundation of China (Grant Nos. 60828004, 60503036)the Fok Ying Tong Education Foundation Award (Grant No. 104027)
文摘Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms.
基金the National Natural Science Foundation of China under Grant Nos.62072460,62076245,and 62172424the Beijing Natural Science Foundation under Grant No.4212022.
文摘Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of privacy leakage.Specifically,word count statistics and updates of latent topics in CGS,which are essential for parameter estimation,could be employed by adversaries to conduct effective membership inference attacks(MIAs).Till now,there are two kinds of methods exploited in CGS to defend against MIAs:adding noise to word count statistics and utilizing inherent privacy.These two kinds of methods have their respective limitations.Noise sampled from the Laplacian distribution sometimes produces negative word count statistics,which render terrible parameter estimation in CGS.Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs.It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy.The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived.It is the first time that R′enyi differential privacy(RDP)has been introduced into CGS and we propose RDP-LDA,an effective framework for analyzing the privacy loss of any differentially private CGS.RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained byε-DP.In RDP-LDA,we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative.And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy.Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs.