摘要
联邦学习是一种新的面向隐私保护的分布式学习范式,相比传统分布式机器学习方法,其特点为各客户端通信、设备算力和存储能力存在较大差异(设备异构),各客户端数据分布和数量存在较大差异(数据异构)以及高通信消耗等。在客户端异构条件(包括设备异构和数据异构)下,客户端的数据分布区别较大,导致模型收敛速度显著降低,特别是在极端的数据异构情况下,传统的联邦学习算法无法收敛,并且训练曲线随着本地迭代轮次的增加出现大幅的波动。针对联邦学习中,客户端异构给模型训练带来的影响,提出了利用分层抽样优化的联邦学习算法——FedSSO。FedSSO使用了基于密度的聚类方法将总体客户端划入不同的聚类中,使得每个聚类中的客户端具有较高的相似度,再按样本权重从不同聚类中抽取可用客户端参与训练,因此所有种类的数据都会按样本权重参与每轮训练,使模型加速收敛到全局最优解;同时,设定了学习率递减和本地迭代轮次选择机制,以保证模型的收敛性。从理论和实验中证明了FedSSO的收敛性,并且在公开数据集MNIST,Cifar-10和Sentiment140上与其他联邦学习算法进行了对比,实验结果证明FedSSO的训练效果更优。
Federated learning(FL) is a new distributed learning framework for privacy protection, which is different from traditional distributed machine learning: 1)differences in communication, computing, and storage performance among devices(device heterogeneity),2)differences in data distribution and data volume(data heterogeneity),and 3)high communication consumption.Under heterogeneous conditions, the data distribution of clients varies greatly, which leads to the decrease of model convergence speed.Especially in the case of highly heterogeneous condition, the traditional FL algorithm cannot converge and the training loss curve will fluctuate greatly with the increase of local iterations.In this work, a FL algorithm based on stratified sampling optimization(FedSSO) is proposed.In FedSSO,a density-based clustering method is used to divide the overall client into different clusters.Then, some available clients are proportionally extracted from different clusters to participate in training.Therefore, various data are involved in each training round to ensure that FL can accelerate convergence to the optimal solution.The strategy of learning rate decay and the choice of local iterations is set to ensure the convergence.The convergence of FedSSO algorithm is proved theoretically and experimentally, and the superiority of FedSSO is demonstrated by comparing it with other FL algorithms on public MNIST,Cifar-10,and Sentiment140 datasets.
作者
鲁晨阳
邓苏
马武彬
吴亚辉
周浩浩
LU Chen-yang;DENG Su;MA Wu-bin;WU Ya-hui;ZHOU Hao-hao(Science and Technology on Information Systems Engineering Laboratory,National University of Defence Technology,Changsha 410073,China)
出处
《计算机科学》
CSCD
北大核心
2022年第9期183-193,共11页
Computer Science
基金
国家自然科学基金面上项目(61871388)。
关键词
联邦学习
隐私保护
聚类
分层抽样
分布式优化
收敛性分析
Federated learning
Privacy protection
Clustering
Stratified sampling
Distributed optimization
Convergence analysis