Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed s...Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.展开更多
Recently, privacy concerns about data collection have received an increasing amount of attention. In data collection process, a data collector (an agency) assumed that all respondents would be comfortable with submi...Recently, privacy concerns about data collection have received an increasing amount of attention. In data collection process, a data collector (an agency) assumed that all respondents would be comfortable with submitting their data if the published data was anonymous. We believe that this assumption is not realistic because the increase in privacy concerns causes some re- spondents to refuse participation or to submit inaccurate data to such agencies. If respondents submit inaccurate data, then the usefulness of the results from analysis of the collected data cannot be guaranteed. Furthermore, we note that the level of anonymity (i.e., k-anonymity) guaranteed by an agency cannot be verified by respondents since they generally do not have access to all of the data that is released. Therefore, we introduce the notion of ki-anonymity, where ki is the level of anonymity preferred by each respondent i. Instead of placing full trust in an agency, our solution increases respondent confidence by allowing each to decide the preferred level of protection. As such, our protocol ensures that respondents achieve their preferred kranonymity during data collection and guarantees that the collected records are genuine and useful for data analysis.展开更多
基金Project supported by the National Natural Science Foundation of China (Nos. 60772098 and 60672068)the New Century Excel-lent Talents in University of China (No. NCET-06-0393)
文摘Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.
基金supported by the Basic Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.NRF-2014R1A1A2058695)
文摘Recently, privacy concerns about data collection have received an increasing amount of attention. In data collection process, a data collector (an agency) assumed that all respondents would be comfortable with submitting their data if the published data was anonymous. We believe that this assumption is not realistic because the increase in privacy concerns causes some re- spondents to refuse participation or to submit inaccurate data to such agencies. If respondents submit inaccurate data, then the usefulness of the results from analysis of the collected data cannot be guaranteed. Furthermore, we note that the level of anonymity (i.e., k-anonymity) guaranteed by an agency cannot be verified by respondents since they generally do not have access to all of the data that is released. Therefore, we introduce the notion of ki-anonymity, where ki is the level of anonymity preferred by each respondent i. Instead of placing full trust in an agency, our solution increases respondent confidence by allowing each to decide the preferred level of protection. As such, our protocol ensures that respondents achieve their preferred kranonymity during data collection and guarantees that the collected records are genuine and useful for data analysis.