期刊文献+

基于不同敏感度的改进K-匿名隐私保护算法 被引量:1

Improved K-anonymity privacy protection algorithm based on different sensitivities
下载PDF
导出
摘要 针对机器学习的发展需要大量兼顾数据安全性和可用性的真实数据集的问题,提出一种基于随机森林(RF)的K-匿名隐私保护算法——RFK-匿名隐私保护。首先,使用RF算法预测出每种属性值的敏感程度;然后,使用k-means聚类算法将属性值根据不同敏感程度进行聚类,再使用K-匿名算法根据属性值的敏感程度集群对数据进行不同程度的隐匿;最后,由用户自主地选择需要哪种隐匿程度的数据表。实验结果表明,在Adult数据集中,与K-匿名算法处理过的数据相比,RFK-匿名隐私保护算法处理过的数据在阈值分别为3、4时的准确率分别提高了0.5、1.6个百分点;与(p,α,k)-匿名算法处理过的数据相比,RFK-匿名隐私保护算法处理过的数据在阈值分别为4、5时的准确率分别提高了0.4、1.9个百分点。RFK-匿名隐私保护算法在保护数据的隐私安全的基础上能有效提高数据的可用性,更适合应用于机器学习中的分类预测。 To address the problem that the development of machine learning requires a large number of real datasets with both data security and availability,an improved K-anonymity privacy protection algorithm based on Random Forest(RF)was proposed,namely RFK-anonymity privacy protection.Firstly,the sensitivity of each attribute value was predicted by RF algorithm.Secondly,the attribute values were clustered according to different sensitivities by using the k-means clustering algorithm,and the data was hidden to different degrees by using the K-anonymity algorithm according to the sensitivity clusters of attribution.Finally,data tables with different hiding degrees were selected by different users according to their needs.Experimental results show that in Adult datasets,compared with the data processed by K-anonymity algorithm,the accuracies of the data processed by the RFK-anonymity privacy protection algorithm are increased by 0.5 and 1.6 percentage points at thresholds of 3 and 4,respectively;compared with the data processed by(p,α,k)-anonymity algorithm,the accuracies of the data processed by the proposed algorithm are improved by 0.4 and 1.9 percentage points at thresholds of 4 and 5.It can be seen that RFK-anonymity privacy protection algorithm can effectively improve the availability of data on the basis of protecting the privacy and security of data,and it is more suitable for classification and prediction in machine learning.
作者 翟冉 陈学斌 张国鹏 裴浪涛 马征 ZHAI Ran;CHEN Xuebin;ZHANG Guopeng;PEI Langtao;MA Zheng(College of Sciences,North China University of Science and Technology,Tangshan Hebei 063210,China;Hebei Provincial Key Laboratory of Data Science and Application(North China University of Science and Technology),Tangshan Hebei 063210,China;Tangshan Key Laboratory of Data Science,North China University of Science and Technology,Tangshan Hebei 063210,China)
出处 《计算机应用》 CSCD 北大核心 2023年第5期1497-1503,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(U20A20179)。
关键词 随机森林 K-匿名 隐私保护 K-MEANS 聚类算法 Random Forest(RF) K-anonymity privacy protection k-means clustering algorithm
  • 相关文献

参考文献17

二级参考文献143

  • 1姜传贤,孙星明,易叶青,杨恒伏.基于JADE算法的数据库公开水印算法的研究[J].系统仿真学报,2006,18(7):1781-1784. 被引量:9
  • 2李涛,王建东,叶飞跃,冯新宇,张有东.一种基于用户聚类的协同过滤推荐算法[J].系统工程与电子技术,2007,29(7):1178-1182. 被引量:70
  • 3Samarati P, Sweency L.Generalizing data to provide anonymity when disclosing information(abstract)[C]//Proc of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.New York:ACM Press, 1998.
  • 4Sweeney L.K-anonymity:a model for protecting privacy[J].International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, 2002,10 (5) : 557-570.
  • 5Sweeney L.Achieving k-anonymity privacy protection using generalization and suppression[J].Intemational Journal on Uncertainty,Fuzziness and Knowledge-based Systems,2002,10(5) :571-588.
  • 6Lefevre K,Dewittd J,Ramakrishnan R.Incognito:efficient full-domain k-anonymity[C]//Proc of the 2005 ACM SIGMOD International Conference on Management of Data.New York: ACM Press, 2005 :49-60.
  • 7Fung B,Wang Ke,Yu ETop-down specialization for information and privacy preservation[C]//Proc of the 21st IEEE International Conference on Data Engineering.Washington DC: IEEE Computer Society, 2005 : 205-216.
  • 8Wang Ke,Yu P,Chakraborty S.Bottom-up generalization:a data mining solution to privacy protection[C]//Proc of the 4th IEEE International Conference on Data Mining.Washington DC: 1EEE Computer Society, 2004: 249-256.
  • 9Machanavajjhala A, Gehrke J, Kifer D.l-diversity: privacy beyond k-anonymity[J].ACM Transactions on Knowledge Discovery from Data.New York: ACM Press, 2007,1 ( 1 ) : 24-35.
  • 10Byun J W,Kamra A, Bertino E,et al.Efficient k-anonymization using clustering teehniques[C]//LNCS 4443 :Proceedings of DASFAA 2007.Berlin Heidelberg:Springer-Verlag,2007:188-200.

共引文献1005

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部