摘要
针对高维数据发布的过程中存在由多关联属性引发的隐私信息泄露风险问题,在分布式环境下提出一种满足差分隐私保护的多关联属性高维数据发布方法(HDMPDP)。根据数据维度,提出一种基于分布式划分的粗糙集高效降维方法,完成对高维复杂数据特征属性的划分,降低数据维度的同时提高处理效率;设计属性分类准则,利用属性信息熵改进关联分析方法;对得到的属性分别进行加噪,优化噪声添加的方式,减轻关联属性带来的隐私问题。在Spark分布式框架下实现隐私保护数据发布,通过高维数据实验验证了该方法的有效性和隐私保护的安全性。
To solve the problem of privacy information leakage caused by multi associated attributes in the publishing process of high-dimensional data sets,a multi associated attribute high-dimensional data privacy protection method(HDMPDP)was proposed in distributed environment.According to the data dimension,an efficient dimensionality reduction method of rough set based on distributed partition was proposed,to complete the division of high-dimensional complex data feature attributes,reduce the data dimension and improve the processing efficiency.The attribute classification criterion was designed,and the attribute information entropy was used.The associated analysis method was improved.The noise was added to the obtained attributes respectively,the way of adding noise was optimized,and the privacy problem caused by associated attributes was alleviated.The privacy-preserving data release was realized under the Spark distributed framework,and the effectiveness of the method and the security of privacy-preserving were verified through high-dimensional data experiments.
作者
褚治广
李俊燕
陈昊
张兴
CHU Zhi-guang;LI Jun-yan;CHEN Hao;ZHANG Xing(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;Key Laboratory of Security for Network and Data in Industrial Internet of Liaoning Province,Liaoning University of Technology,Jinzhou 121001,China)
出处
《计算机工程与设计》
北大核心
2024年第4期967-973,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(61802161)
辽宁省教育厅科学研究基金项目(JZL202015404、LJKZ0625)。
关键词
高维数据
多关联属性
差分隐私
分布式
关联分析
粗糙集
隐私保护
high dimensional data
multi-associated attribute
differential privacy
distributed
association analysis
rough set
privacy protection