摘要
在分布式环境中,垂直分割是一种保护用户隐私的有效方法。然而,当前的垂直分割策略假设参与数据存储的各个云服务提供商(cloud service provider,CSP)之间不存在共谋。针对实际场景中CSP之间可能存在的共谋问题,探讨了如何在这种情况下保护用户数据隐私。假设有n个CSP参与数据存储,其中最多k个CSP可能会共谋,给出了垂直分割的(k,n)-安全定义,并提出了MLVP(machine learning vertical partitioning)方案。该方案利用机器学习算法分析属性之间的关联性,对得到的所有关联性进行优化,并将计算垂直分割方法问题转化成可满足性问题,再利用可满足性问题求解器得到分割方法。此外,对MLVP方案的安全性进行理论分析,并在真实数据集上进行实验,比较不同机器学习算法和隐私保护强度对分割效果和性能的影响;与两个不考虑CSP存在共谋的垂直分割方案(Oriol方案和Ciriani方案)在计算速度和查询速度上进行了比较。实验结果表明:在计算速度上,因为要保证CSP共谋时的安全性,MLVP方案略慢,在查询速度上,MLVP方案相较Oriol方案和Ciriani方案分别提升了32.6%和8.8%。
In distributed environments,vertical partitioning had been an effective method to protect user privacy.However,current vertical partitioning strategies assumed that there was no collusion among the CSPs(Cloud Service Providers)involved in data storage.This study explored how to protect user data privacy when collusion might exist between CSP.Assuming n CSP participated in data storage,with no more than k of these potentially colluding,this paper defined a(k,n)-security for vertical partitioning and introduced an automated computation scheme for vertical partitioning based on machine learning-the MLVP scheme.This MLVP scheme utilized machine learning algorithms to analyze the correlation between attributes,optimized all correlations,and transformed the vertical par‐titioning problem into a satisfiability problem,which was then solved using a satisfiability solver.Moreover,the security of the MLVP scheme was theoretically analyzed.To validate the effectiveness of the MLVP scheme,experiments were conducted on real datasets to compare the impact of different machine learning algorithms and levels of privacy protection on the effectiveness and performance of the vertical partitioning.The experiments also compared the MLVP scheme with two other schemes that did not consider collusion among CSP,Oriol’s and Ciriani’s schemes,in terms of computation and query speeds.The results showed that the MLVP scheme was slightly slower in computation speed to ensure security against partial CSP collusion.However,it improved the query speed by 32.6%and 8.8%compared to the aforementioned schemes,respectively.
作者
阮华锋
李睿
罗凯伦
RUAN Huafeng;LI Rui;LUO Kailun(Dongguan University of Technology,Dongguan 523808,China)
出处
《网络与信息安全学报》
2024年第5期175-187,共13页
Chinese Journal of Network and Information Security
基金
国家重点研发计划(2021YFB3101303)
国家自然科学基金(61972089,62206055)。
关键词
垂直分割
隐私保护
K-匿名模型
机器学习
可满足性问题
vertical partitioning
privacy protection
k-anonymity
machine learning
satisfiability problem