摘要
针对直方图隐私泄露与分组数难以确定的问题,提出一种基于差分隐私的非等距直方图数据发布算法。首先,提出一种改进的定量化的综合评价指标,将直方图的分组评判标准定量化为特定的计算公式,以确定直方图最优分组数。然后,利用经验分布函数设计隐私预算分配方案,计算得出分组边界,从而构建非等距直方图。最后,根据非等距边界划分的分组,统计组内频数,对频数进行加噪,发布满足差分隐私的非等距直方图。实验结果表明,分组数的最优计算及非等距的实现,保证了直方图发布数据的准确性和隐私性,同时仍能保证直方图的分布特征不受影响,该文所提发布算法的均方误差与同类精确的直方图发布(accurate histogram publication, AHP)算法相比降低了99%。
To address the histogram privacy leakage and the challenge of determining the number of groups,a non-equidistant histogram data publishing algorithm based on differential privacy(DP)is proposed.Firstly,an improved quantified comprehensive evaluation index is introduced,which quantifies the criterion of histogram grouping into a specific calculation formula to determine the optimal number of histogram groups.Next,the empirical distribution function is used to design a privacy budget allocation scheme,and the grouping boundaries are calculated to construct the non-equidistant histogram. The dataset is thendivided according to the non-equidistant boundaries, and the frequencies are counted, withnoise added to satisfy the differential privacy requirements. The non-equidistant histogramis subsequently published. Experimental results show that the optimal calculation of thenumber of groups and the implementation of non-equidistance can ensure the accuracy andprivacy of the published data of the histogram, while preserving the distribution characteristicsof the histogram. The mean square error of the proposed algorithm is reduced by99% compared with similar accurate histogram publication (AHP) algorithms.
作者
单丽洋
陈学斌
郭如敏
SHAN Liyang;CHEN Xuebin;GUO Rumin(College of Science,North China University of Science and Technology,Tangshan 063210,Hebei,China;Hebei Provincial Key Laboratory of Data Science and Application,North China,University of Science and Technology,Tangshan 063210,Hebei,China;Tangshan Key Laboratory of Data Science,North China University of Science and Technology,Tangshan 063210,Hebei,China)
出处
《应用科学学报》
CAS
CSCD
北大核心
2024年第6期1052-1063,共12页
Journal of Applied Sciences
基金
国家自然科学基金(No.U20A20179)资助。
关键词
非等距
直方图分组
差分隐私
隐私预算
non-isometric
histogram grouping
differential privacy(DP)
privacy budget