摘要
空间位置数据分布通常具有不均匀性,不同位置区域的密度差异较大,在本地差分隐私模型中无法直接获取用户真实的位置数据,使得空间位置划分方法受到限制以及数据发布存在查询精度低、通信代价大等问题。为在本地差分隐私模型下的大规模空间数据采集和发布过程中进行空间划分,提出一种空间数据分层自适应划分算法KDG-HT。通过收集部分用户的数据来初步获取区域的分布情况,采用KD-树的思想划分区域,并利用抽样技术对用户进行分组,根据分组用户统计结果所提供的先验知识来完成多层细粒度划分。在此基础上,结合差分隐私模型的并行组合特性分层扰动用户数据,从总体上实现发布数据的ε-差分隐私保护。实验结果表明,KDG-HT算法适用于具有不同数据分布情况的大规模空间数据集,查询精度及运行效率优于RAPPOR、UG、GT-R等算法,其中与GT-R算法相比,KDG-HT算法发布数据的查询精度最高提升3倍,运行效率提高17%。
The distribution of spatial location data is typically uneven,and the density of different location areas varies significantly.In the local differential privacy model,the actual location data of users cannot be obtained directly,which limits the spatial location segmentation method.Additionally,problems exist in data publishing,such as low query accuracy and high communication cost.To segment a space during large-scale spatial data acquisition and distribution using the local differential privacy model,a hierarchical adaptive spatial data partition algorithm,KDG-HT,is proposed.By acquiring the data of some users to preliminarily obtain the distribution of a region,the region can be segmented using the concept of the KD-tree,and the users can be categorized via sampling technology.By applying prior knowledge provided by the statistical results of the grouped users,the multilayer fine-grained segmentation is completed.Subsequently,combined with the parallel combination characteristics of the differential privacy model,the user data are disturbed hierarchically,and the publishing data are realized as a completeε-differential privacy protection.Experimental results show that the KDG-HT algorithm is suitable for large-scale spatial datasets with different data distributions,and that its query accuracy and operation efficiency are better than those of RAPPOR,UG,GT-R,and other algorithms.Compared with the GT-R algorithm,the KDG-HT algorithm can improve the query accuracy of published data by up to three times and the operation efficiency by 17%.
作者
金媛媛
倪志伟
朱旭辉
陈恒恒
陈千
JIN Yuanyuan;NI Zhiwei;ZHU Xuhui;CHEN Hengheng;CHEN Qian(School of Management,Hefei University of Technology,Hefei 230009,China;Key Laboratory of Process Optimization and Intelligent Decision-Making,Ministry of Education,Hefei University of Technology,Hefei 230009,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第5期136-144,共9页
Computer Engineering
基金
国家自然科学基金“大数据环境下协同商务智能构建中的关键技术研究”(91546108)
安徽省科技重大专项“面向混合云数据中心的隐私安全防护关键技术及其虚拟化设备研发”(201903a05020020)。
关键词
本地差分隐私
空间自适应划分
用户随机采样
空间范围查询
随机响应
local differential privacy
spatial adaptive partition
user random sampling
spatial range query
randomized response