摘要
针对传统差分隐私保护的谱聚类算法存在聚类效果不理想的不足,提出一种面向差分隐私保护的自适应谱聚类优化新算法。采用互邻高斯核函数得到稀疏相似度矩阵,分析高维数据集的数据特征与聚类簇数的关系解决降维幅度和聚类簇数的不确定性;引入中间信息向量和中间性的概念来克服初始簇中心选取的盲目性;根据多维高斯分布离群点检验后的结果采用插补法解决离群点问题。仿真实验结果表明,该算法能够有效克服传统方法的不足,且在同一数据集相同隐私保护参数下,可以在保证数据隐私安全性的同时改善聚类效率并显著提高聚类可用性。
In view of the shortcomings that the clustering effect of the spectral clustering algorithms based on traditional differential privacy protection may not be ideal,a new adaptive spectral clustering optimization algorithm is proposed based on differential privacy protection.The sparse similarity matrix was obtained by using the mutual adjacent Gaussian kernel function,and the relationship between the data features of high-dimensional data sets and the number of clusters was analyzed to solve the uncertainty of dimensionality reduction and cluster numbers.The concepts of intermediate information vector and intermediate property were introduced to overcome the blindness of the initial cluster center selection.The outlier problem was solved by interpolation according to the results of outlier test of multi-dimensional Gaussian distribution.Simulation results show that the algorithm can effectively overcome the shortcomings of traditional methods,and under the same data set and the same privacy protection parameters,it can significantly improve the clustering efficiency and cluster availability while ensuring data privacy security.
作者
金亦乔
章永祺
王博
王鑫轲
李昭祥
Jin Yiqiao;Zhang Yongqi;Wang Bo;Wang Xinke;Li Zhaoxiang(College of Mathematics and Physics,Shanghai Normal University,Shanghai 200234,China;College of Information and Electrical Engineering,Shanghai Normal University,Shanghai 200234,China)
出处
《计算机应用与软件》
北大核心
2023年第9期261-266,共6页
Computer Applications and Software
基金
国家自然科学基金项目(11871043,12271366,12171322)
上海市科技计划项目(20JC1414200)
上海市自然科学基金项目(21ZR1447200,22ZR1445500)。
关键词
隐私保护
差分隐私
谱聚类
聚类可用性
Privacy preserving
Differential privacy
Spectral clustering
Clustering availability