摘要
在大数据分析中,由于数据量巨大,储存于不同的机器中,常用的统计分析方法不能直接适用.因此需要对数据进行分布式计算.无论是分而治之还是多中心数据都需要对数据或计算中间结果进行传输.传输中不仅需要对数据进行隐私保护,也需要保证传输的高效性,同时传输次数过多不仅影响计算的效率,对数据的隐私保护也更有挑战.受此启发,本文在差分隐私模型下,提出了用于高效通讯的分布式参数估计算法中的隐私保护方案,并且严格证明了该方案既能有效保护数据安全,又不影响参数估计的有效性.最后,本文就线性模型下基于差分隐私保护算法的参数估计进行了模拟和实例验证.
Due to the huge amount of data stored in different machines,common statistical methods cannot be directly used in big data analysis.Hence,it is necessary to develop the distributed algorithms.Both divide-and-conquer and multi-center methods require the data interaction,in which data privacy and efficient communication are two keys.Furthermore,too many transmissions not only affect the efficiency of computing,but also challenge data privacy protection.Inspired by this,the paper propose two types of privacy-preserving estimation in communication-efficient distributed cases based on the differential privacy.Meanwhile,we strictly prove that the scheme can not only effectively protect data security,but also does not affect the validity of parameter estimation.Finally,both simulation results and a real example illustrate the loss of privacy protection in estimation under the linear model assumption.
作者
郁淼淼
李子洋
周勇
YU Miaomiao;LI Ziyang;ZHOU Yong(Academy of Statistics and Interdisciplinary Sciences and School of Statistics,East China Normal University,Shanghai 200062,China;Key Laboratory of Advanced Theory and Application in Statistics and Data Science(MOE),Shanghai 200062,China)
出处
《应用数学学报》
CSCD
北大核心
2023年第2期145-165,共21页
Acta Mathematicae Applicatae Sinica
基金
国家重点研发计划(2021YFA1000100,2021YFA1000101)
国家自然科学基金重大研究计划培育项目(92046005)
国家自然科学基金重点项目(71931004)
中国博士后科学基金面上资助(2021M691036)资助项目。
关键词
数据隐私
差分隐私
高效通讯
参数估计
噪声机制
data privacy
differential privacy
communication-efficient
parameter estimation
noise mechanism