期刊文献+

基于本地差分隐私的键-值数据精确收集方法 被引量:5

Key-Value Data Accurate Collection under Local Differential Privacy
下载PDF
导出
摘要 基于本地差分隐私的键-值数据的收集与分析得到了研究者的广泛关注.键与值的值域大小、二者之间的关联性、报告给收集者的通信方式以及本地扰动机制直接制约着频率与均值估计的精度.针对现有键-值数据本地扰动方法存在的不足,该文提出了一种精确且有效的本地扰动方法LDPKV(Locally Differentially Private Key-Value data collection),该方法结合输入值域与输出值域之间的整体映射关系,在较少通信代价与不分割隐私预算的情况下,对键-值对进行统一处理.其主要思想是:首先对每个用户所拥有的键-值对进行统一离散化处理;结合每个用户的离散化结果,利用伯努利采样技术随机地抽样一条键-值对进行本地随机扰动;然后将扰动后的键-值对报告给收集者.收集者利用每个用户的报告值估计每个键的频率以及所对应值的均值.理论分析了LDPKV方法产生的方差与最大偏差,以及与现有键-值数据收集方法在真实与合成数据集上进行综合比较.实验结果表明LDPKV方法均优于同类方法. Privacy-preserving data collection is an important problem that has attracted much research attention.The state-of-the-art solution for this problem is local differential privacy,which provides a strong protection by adding random noise from users.Existing methods with local differential privacy,however,rarely focus on key-value data collection.In this paper,given a set U of users,we study locally differentially private algorithms for collecting key-value data over U to estimate the frequency of each key and mean of the value.Existing methods for key-value data collection mostly adopt direct encoding mechanisms and local perturbation mechanisms,which locally encode and perturb the value of each user.The two mechanisms,however,require that we have to transform the whole domain of key-value data and split the privacy budget.The transformation of the whole domain makes the communication cost too high.Furthermore,the partition of the budget leads to excessive noise and lower utility.To remedy the deficiency of existing methods,this paper proposes an accurate and efficient method with local differential privacy,called LDPKV(Locally Differentially Private Key-Value data collection)to collect and analyze key-value data.LDPKV firstly discretes the key-value data of each user in terms of the mapping between input and output domain.Based on the discretization,each user in LDPKV randomly samples a key-value pair by using Bernoulli sampling technology,and runs the local randomized response on the sample and reports the index of sample along with the perturbed value to the collector.After that,the collector in LDPKV accumulates all of the reports from users to estimate the frequency of each key and mean of values.Finally,we conduct theoretical analysis on variance and error bound of LDPKV.Based on two metrics:MSE(Mean Square Error)and RE(Relative Error),three groups of experiments were conducted on real and synthetic datasets(TalkingData,JData,GAUSS,PLAW,and LNR)to evaluate the accuracy of the frequency and the mean generated by from LDPKV,Rappor,KRR,PrivKV,PrivKVM,KVOH,and Harmony.For example,based on JData dataset,we fixε=1.6 to compare the accuracy of the above seven methods.It can be seen from Figure 2(a)that the MSE of LDPKV is 0.002,and 1/6 times that of Rappor and KRR,and 1/2 times that of PrivKV,PrivKVM,and KVOH,respectively.From Figure 2(b),whenεvaries from 0.1 to 3.2,LDPKV achieves a small value of MSE on mean evaluation against Rappor,Harmony,PrivKV,PrivKVM,and KVOH.Besides,based on three synthetic datasets,LDPKV still achieves better accuracy than Rappor,KRR,PrivKV,PrivKVM,KVOH,and Harmony.For instance,from Figure 3 to Figure 5,when varyingεfrom 0.1 to 0.8 and varying top k from top10 to top50,we can see that the RE of six methods decreases gradually.Especially,on fixingε=0.2 and top k=top10,the RE of LDPKV is 1/4 times,1/6 times that of Rappor and KRR on LNR and PLAW,respectively.From the results of the above experimental analysis,we can see that LDPKV is better than its competitors.
作者 张啸剑 付楠 孟小峰 ZHANG Xiao-Jian;FU Nan;MENG Xiao-Feng(School of Computer&Information Engineering,Henan University of Economics and Law,Zhengzhou 450046;School of Information,Renmin University of China,Beijing 100872)
出处 《计算机学报》 EI CSCD 北大核心 2020年第8期1479-1492,共14页 Chinese Journal of Computers
基金 国家自然科学基金项目(61502146,91746115,91646203,61772131) 河南省自然科学基金资助项目(162300410006) 河南省高等学校青年骨干教师培养计划项目(2017GGJS084) 河南省教育厅高等学校重点科研项目(16A520002) 河南财经政法大学青年拔尖人才资助计划项目资助.
关键词 键-值数据 隐私保护 本地差分隐私 频率估计 均值估计 key-value data privacy protection local differential privacy frequency estimation mean estimation
  • 相关文献

同被引文献37

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部