基于分层抽样优化的面向异构客户端的联邦学习

Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients

下载PDF

导出

摘要联邦学习是一种新的面向隐私保护的分布式学习范式,相比传统分布式机器学习方法,其特点为各客户端通信、设备算力和存储能力存在较大差异(设备异构),各客户端数据分布和数量存在较大差异(数据异构)以及高通信消耗等。在客户端异构条件(包括设备异构和数据异构)下,客户端的数据分布区别较大,导致模型收敛速度显著降低,特别是在极端的数据异构情况下,传统的联邦学习算法无法收敛,并且训练曲线随着本地迭代轮次的增加出现大幅的波动。针对联邦学习中,客户端异构给模型训练带来的影响,提出了利用分层抽样优化的联邦学习算法——FedSSO。FedSSO使用了基于密度的聚类方法将总体客户端划入不同的聚类中,使得每个聚类中的客户端具有较高的相似度,再按样本权重从不同聚类中抽取可用客户端参与训练,因此所有种类的数据都会按样本权重参与每轮训练,使模型加速收敛到全局最优解;同时,设定了学习率递减和本地迭代轮次选择机制,以保证模型的收敛性。从理论和实验中证明了FedSSO的收敛性,并且在公开数据集MNIST,Cifar-10和Sentiment140上与其他联邦学习算法进行了对比,实验结果证明FedSSO的训练效果更优。 Federated learning(FL) is a new distributed learning framework for privacy protection, which is different from traditional distributed machine learning: 1)differences in communication, computing, and storage performance among devices(device heterogeneity),2)differences in data distribution and data volume(data heterogeneity),and 3)high communication consumption.Under heterogeneous conditions, the data distribution of clients varies greatly, which leads to the decrease of model convergence speed.Especially in the case of highly heterogeneous condition, the traditional FL algorithm cannot converge and the training loss curve will fluctuate greatly with the increase of local iterations.In this work, a FL algorithm based on stratified sampling optimization(FedSSO) is proposed.In FedSSO,a density-based clustering method is used to divide the overall client into different clusters.Then, some available clients are proportionally extracted from different clusters to participate in training.Therefore, various data are involved in each training round to ensure that FL can accelerate convergence to the optimal solution.The strategy of learning rate decay and the choice of local iterations is set to ensure the convergence.The convergence of FedSSO algorithm is proved theoretically and experimentally, and the superiority of FedSSO is demonstrated by comparing it with other FL algorithms on public MNIST,Cifar-10,and Sentiment140 datasets.

作者鲁晨阳邓苏马武彬吴亚辉周浩浩 LU Chen-yang;DENG Su;MA Wu-bin;WU Ya-hui;ZHOU Hao-hao(Science and Technology on Information Systems Engineering Laboratory,National University of Defence Technology,Changsha 410073,China)

机构地区国防科技大学信息系统工程重点实验室

出处《计算机科学》 CSCD 北大核心 2022年第9期183-193,共11页 Computer Science

基金国家自然科学基金面上项目(61871388)。

关键词联邦学习隐私保护聚类分层抽样分布式优化收敛性分析 Federated learning Privacy protection Clustering Stratified sampling Distributed optimization Convergence analysis

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1徐旸,王佳斌,彭凯.结合PCA的t-SNE算法的并行化实现方法[J].华侨大学学报（自然科学版）,2022,43(5):685-692.
2Science China Technological Sciences2022年65卷第8期中文摘要[J].中国科学：技术科学,2022,52(8).
3何平,李刚,李慧斌.基于深度学习的视频异常检测方法综述[J].计算机工程与科学,2022,44(9):1620-1629. 被引量：9
4顾凌云.联邦学习技术在金融行业的应用研究[J].IT经理世界,2022,25(6):139-142.
5乔学博,杨志祥,李勇,凌锋,钟俊杰,张岭乔.计及两级碳交易和需求响应的多微网合作运行优化策略[J].高电压技术,2022,48(7):2573-2583. 被引量：22
6任鑫芳,张志朝,许李天伦,王诗超,刘展志,许方圆.计及电动汽车与温控负荷需求响应的分层能源系统优化调度[J].电力建设,2022,43(9):77-86. 被引量：11
7舒畅,李青山,王璐,王子奇,计亚江.基于梯度博弈的网络化软件优化机制[J].计算机研究与发展,2022,59(9):1902-1913. 被引量：1
8缪泽鑫,张会生,任磊.引入注意力机制的AdaBoost算法[J].计算机仿真,2022,39(7):337-341. 被引量：2
9赵晶晶,朱炯达,李振坤,张宇,刘帅,李梓博.考虑灵活性供需鲁棒平衡的两阶段配电网日内分布式优化调度[J].电力系统自动化,2022,46(16):61-71. 被引量：22
10王浩竣,梁亚楠,黎琳,李锐.联邦学习隐私保护机制综述[J].现代计算机,2022,28(14):1-12. 被引量：2

计算机科学

2022年第9期

浏览历史

内容加载中请稍等...

基于分层抽样优化的面向异构客户端的联邦学习

相关作者

相关机构

相关主题

浏览历史