Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with t...Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.展开更多
Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and pr...Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and processing big data poses a privacy concern related to protecting individuals’sensitive information while maintaining the usability of the published data.Several anonymization methods,such as slicing and merging,have been designed as solutions to the privacy concerns for publishing big data.However,the major drawback of merging and slicing is the random permutation procedure,which does not always guarantee complete protection against attribute or membership disclosure.Moreover,merging procedures may generatemany fake tuples,leading to a loss of data utility and subsequent erroneous knowledge extraction.This study therefore proposes a slicingbased enhanced method for privacy-preserving big data publishing while maintaining the data utility.In particular,the proposed method distributes the data into horizontal and vertical partitions.The lower and upper protection levels are then used to identify the unique and identical attributes’values.The unique and identical attributes are swapped to ensure the published big data is protected from disclosure risks.The outcome of the experiments demonstrates that the proposed method could maintain data utility and provide stronger privacy preservation.展开更多
In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajecto...In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.展开更多
With the increasing prevalence of social networks, more and more social network data are published for many applications, such as social network analysis and data mining. However, this brings privacy problems. For exa...With the increasing prevalence of social networks, more and more social network data are published for many applications, such as social network analysis and data mining. However, this brings privacy problems. For example, adversaries can get sensitive information of some individuals easily with little background knowledge. How to publish social network data for analysis purpose while preserving the privacy of individuals has raised many concerns. Many algorithms have been proposed to address this issue. In this paper, we discuss this privacy problem from two aspects: attack models and countermeasures. We analyse privacy conceres, model the background knowledge that adversary may utilize and review the recently developed attack models. We then survey the state-of-the-art privacy preserving methods in two categories: anonymization methods and differential privacy methods. We also provide research directions in this area.展开更多
如何在发布涉及个人隐私的数据时保证敏感信息不泄露,同时又能最大程度地提高发布数据的效用,是隐私保护中面临的重大挑战。近年来国内外学者对数据发布中的隐私保护(privacy-preserving data publishing,PPDP)进行了大量研究,适时地对...如何在发布涉及个人隐私的数据时保证敏感信息不泄露,同时又能最大程度地提高发布数据的效用,是隐私保护中面临的重大挑战。近年来国内外学者对数据发布中的隐私保护(privacy-preserving data publishing,PPDP)进行了大量研究,适时地对研究成果进行总结,能够明确研究方向。对数据发布领域的隐私保护成果进行了总结,介绍了常用的隐私保护模型和技术、隐私度量标准和算法,重点阐述了PPDP在不同场景中的应用,指出了PPDP可能的研究课题和应用前景。展开更多
文摘Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.
基金This work was supported by Postgraduate Research Grants Scheme(PGRS)with Grant No.PGRS190360.
文摘Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and processing big data poses a privacy concern related to protecting individuals’sensitive information while maintaining the usability of the published data.Several anonymization methods,such as slicing and merging,have been designed as solutions to the privacy concerns for publishing big data.However,the major drawback of merging and slicing is the random permutation procedure,which does not always guarantee complete protection against attribute or membership disclosure.Moreover,merging procedures may generatemany fake tuples,leading to a loss of data utility and subsequent erroneous knowledge extraction.This study therefore proposes a slicingbased enhanced method for privacy-preserving big data publishing while maintaining the data utility.In particular,the proposed method distributes the data into horizontal and vertical partitions.The lower and upper protection levels are then used to identify the unique and identical attributes’values.The unique and identical attributes are swapped to ensure the published big data is protected from disclosure risks.The outcome of the experiments demonstrates that the proposed method could maintain data utility and provide stronger privacy preservation.
基金supported by the Fundamental Research Funds for the Central Universities(No.GK201906009)CERNET Innovation Project(No.NGII20190704)Science and Technology Program of Xi’an City(No.2019216914GXRC005CG006-GXYD5.2).
文摘In recent years,mobile Internet technology and location based services have wide application.Application providers and users have accumulated huge amount of trajectory data.While publishing and analyzing user trajectory data have brought great convenience for people,the disclosure risks of user privacy caused by the trajectory data publishing are also becoming more and more prominent.Traditional k-anonymous trajectory data publishing technologies cannot effectively protect user privacy against attackers with strong background knowledge.For privacy preserving trajectory data publishing,we propose a differential privacy based(k-Ψ)-anonymity method to defend against re-identification and probabilistic inference attack.The proposed method is divided into two phases:in the first phase,a dummy-based(k-Ψ)-anonymous trajectory data publishing algorithm is given,which improves(k-δ)-anonymity by considering changes of thresholdδon different road segments and constructing an adaptive threshold setΨthat takes into account road network information.In the second phase,Laplace noise regarding distance of anonymous locations under differential privacy is used for trajectory perturbation of the anonymous trajectory dataset outputted by the first phase.Experiments on real road network dataset are performed and the results show that the proposed method improves the trajectory indistinguishability and achieves good data utility in condition of preserving user privacy.
文摘With the increasing prevalence of social networks, more and more social network data are published for many applications, such as social network analysis and data mining. However, this brings privacy problems. For example, adversaries can get sensitive information of some individuals easily with little background knowledge. How to publish social network data for analysis purpose while preserving the privacy of individuals has raised many concerns. Many algorithms have been proposed to address this issue. In this paper, we discuss this privacy problem from two aspects: attack models and countermeasures. We analyse privacy conceres, model the background knowledge that adversary may utilize and review the recently developed attack models. We then survey the state-of-the-art privacy preserving methods in two categories: anonymization methods and differential privacy methods. We also provide research directions in this area.
文摘如何在发布涉及个人隐私的数据时保证敏感信息不泄露,同时又能最大程度地提高发布数据的效用,是隐私保护中面临的重大挑战。近年来国内外学者对数据发布中的隐私保护(privacy-preserving data publishing,PPDP)进行了大量研究,适时地对研究成果进行总结,能够明确研究方向。对数据发布领域的隐私保护成果进行了总结,介绍了常用的隐私保护模型和技术、隐私度量标准和算法,重点阐述了PPDP在不同场景中的应用,指出了PPDP可能的研究课题和应用前景。