Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and pr...Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and processing big data poses a privacy concern related to protecting individuals’sensitive information while maintaining the usability of the published data.Several anonymization methods,such as slicing and merging,have been designed as solutions to the privacy concerns for publishing big data.However,the major drawback of merging and slicing is the random permutation procedure,which does not always guarantee complete protection against attribute or membership disclosure.Moreover,merging procedures may generatemany fake tuples,leading to a loss of data utility and subsequent erroneous knowledge extraction.This study therefore proposes a slicingbased enhanced method for privacy-preserving big data publishing while maintaining the data utility.In particular,the proposed method distributes the data into horizontal and vertical partitions.The lower and upper protection levels are then used to identify the unique and identical attributes’values.The unique and identical attributes are swapped to ensure the published big data is protected from disclosure risks.The outcome of the experiments demonstrates that the proposed method could maintain data utility and provide stronger privacy preservation.展开更多
In recent years,with the explosive development in Internet,data storage and data processing technologies,privacy preservation has been one of the greater concerns in data mining.A number of methods and techniques have...In recent years,with the explosive development in Internet,data storage and data processing technologies,privacy preservation has been one of the greater concerns in data mining.A number of methods and techniques have been developed for privacy preserving data mining.This paper provided a wide survey of different privacy preserving data mining algorithms and analyzed the representative techniques for privacy preservation.The existing problems and directions for future research are also discussed.展开更多
Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can...Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.展开更多
基金This work was supported by Postgraduate Research Grants Scheme(PGRS)with Grant No.PGRS190360.
文摘Publishing big data and making it accessible to researchers is important for knowledge building as it helps in applying highly efficient methods to plan,conduct,and assess scientific research.However,publishing and processing big data poses a privacy concern related to protecting individuals’sensitive information while maintaining the usability of the published data.Several anonymization methods,such as slicing and merging,have been designed as solutions to the privacy concerns for publishing big data.However,the major drawback of merging and slicing is the random permutation procedure,which does not always guarantee complete protection against attribute or membership disclosure.Moreover,merging procedures may generatemany fake tuples,leading to a loss of data utility and subsequent erroneous knowledge extraction.This study therefore proposes a slicingbased enhanced method for privacy-preserving big data publishing while maintaining the data utility.In particular,the proposed method distributes the data into horizontal and vertical partitions.The lower and upper protection levels are then used to identify the unique and identical attributes’values.The unique and identical attributes are swapped to ensure the published big data is protected from disclosure risks.The outcome of the experiments demonstrates that the proposed method could maintain data utility and provide stronger privacy preservation.
基金This work was supported by the National Social Science Foundation Project of China under Grant 16BTQ085.
文摘In recent years,with the explosive development in Internet,data storage and data processing technologies,privacy preservation has been one of the greater concerns in data mining.A number of methods and techniques have been developed for privacy preserving data mining.This paper provided a wide survey of different privacy preserving data mining algorithms and analyzed the representative techniques for privacy preservation.The existing problems and directions for future research are also discussed.
文摘Data is humongous today because of the extensive use of World WideWeb, Social Media and Intelligent Systems. This data can be very important anduseful if it is harnessed carefully and correctly. Useful information can beextracted from this massive data using the Data Mining process. The informationextracted can be used to make vital decisions in various industries. Clustering is avery popular Data Mining method which divides the data points into differentgroups such that all similar data points form a part of the same group. Clusteringmethods are of various types. Many parameters and indexes exist for the evaluationand comparison of these methods. In this paper, we have compared partitioningbased methods K-Means, Fuzzy C-Means (FCM), Partitioning AroundMedoids (PAM) and Clustering Large Application (CLARA) on secure perturbeddata. Comparison and identification has been done for the method which performsbetter for analyzing the data perturbed using Extended NMF on the basis of thevalues of various indexes like Dunn Index, Silhouette Index, Xie-Beni Indexand Davies-Bouldin Index.