[目的/意义]数字图书馆逐渐向智慧图书馆转变。图书馆数据的收集、分析等数据使用行为不断被实践,并对业务管理与服务创新做出一定的贡献。然而,涉及用户隐私敏感数据的使用可能会带来安全方面的问题。[方法/过程]本文在分析传统的图书...[目的/意义]数字图书馆逐渐向智慧图书馆转变。图书馆数据的收集、分析等数据使用行为不断被实践,并对业务管理与服务创新做出一定的贡献。然而,涉及用户隐私敏感数据的使用可能会带来安全方面的问题。[方法/过程]本文在分析传统的图书馆数据挖掘方法基础上,尝试引用PPDM(Privacy-Preserving Data Mining)的数据泛化、清洗、屏蔽、扭曲等方法,将数据挖掘与业务需求相融合,并以用户数据规范化使用为目标,探索智慧服务背景下用户隐私保护机制,构建业务实施与数据保护融合的可行性方案。[结果/结论]智慧图书馆数据收集、数据发布、数据共享、数据汇聚都可以借鉴PPDM方法对用户隐私数据加以保护。智慧图书馆只有紧密联系技术创新才能够保障服务创新,从而促进智慧图书馆事业的发展。展开更多
基于随机化的数据扰乱及重构技术是数据挖掘中的隐私保护(Privacy-Preserving Data Mining,PPDM)领域中最重要的方法之一。但是,随机化难以消除由于属性变量本身相关性引起的数据泄漏。介绍了一种利用主成分分析(Principal Component An...基于随机化的数据扰乱及重构技术是数据挖掘中的隐私保护(Privacy-Preserving Data Mining,PPDM)领域中最重要的方法之一。但是,随机化难以消除由于属性变量本身相关性引起的数据泄漏。介绍了一种利用主成分分析(Principal Component Anal-ysis,PCA)进行属性精简的增强随机化方法,降低了参与数据挖掘的属性数据间相关性,更好地保护了隐私数据。展开更多
In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidential...In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.展开更多
Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed s...Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.展开更多
文摘[目的/意义]数字图书馆逐渐向智慧图书馆转变。图书馆数据的收集、分析等数据使用行为不断被实践,并对业务管理与服务创新做出一定的贡献。然而,涉及用户隐私敏感数据的使用可能会带来安全方面的问题。[方法/过程]本文在分析传统的图书馆数据挖掘方法基础上,尝试引用PPDM(Privacy-Preserving Data Mining)的数据泛化、清洗、屏蔽、扭曲等方法,将数据挖掘与业务需求相融合,并以用户数据规范化使用为目标,探索智慧服务背景下用户隐私保护机制,构建业务实施与数据保护融合的可行性方案。[结果/结论]智慧图书馆数据收集、数据发布、数据共享、数据汇聚都可以借鉴PPDM方法对用户隐私数据加以保护。智慧图书馆只有紧密联系技术创新才能够保障服务创新,从而促进智慧图书馆事业的发展。
文摘基于随机化的数据扰乱及重构技术是数据挖掘中的隐私保护(Privacy-Preserving Data Mining,PPDM)领域中最重要的方法之一。但是,随机化难以消除由于属性变量本身相关性引起的数据泄漏。介绍了一种利用主成分分析(Principal Component Anal-ysis,PCA)进行属性精简的增强随机化方法,降低了参与数据挖掘的属性数据间相关性,更好地保护了隐私数据。
文摘In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.
基金Project supported by the National Natural Science Foundation of China (Nos. 60772098 and 60672068)the New Century Excel-lent Talents in University of China (No. NCET-06-0393)
文摘Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.