摘要
联邦学习有效解决了数据孤岛问题,但仍然存在一些挑战。首先,联邦学习的训练节点具有较大的硬件异构性,对训练速度和模型性能存在影响,现有工作主要集中于联邦优化,但多数方法没有解决同步通信模式下各节点计算时间难以协调导致资源浪费的问题;此外,联邦学习中多数训练节点为移动设备,网络环境差,通信开销高,导致了更严重的网络瓶颈。已有方法通过对训练节点上传的梯度进行压缩来降低通信开销,但不可避免地带来了模型性能损失,难以达到较好的质量和效率的平衡。针对上述难题,在计算阶段,提出了自适应梯度聚合(Adaptive Federated Averaging,AFA),根据各个节点的硬件性能自适应协调本地训练的迭代周期,使得等待全局梯度下载的空闲时间整体最小化,提高了联邦学习的计算效率。在通信阶段,提出双重稀疏化(Double Sparsification,DS),通过在训练节点端和参数服务器端进行梯度稀疏化来最大化降低通信开销。此外,各个训练节点根据本地梯度信息和全局梯度信息的丢失值进行误差补偿,以较小的模型性能损失换取较大的通信开销降低。在图像分类数据集和时序预测数据集上进行实验,结果证明,所提方案有效提高了联邦学习训练的加速比,对模型性能也有一定提升。
Federated learning effectively solves the problem of isolated data island,but there are some challenges.Firstly,the training nodes of federated learning have a large hardware heterogeneity,which has an impact on the training speed and model performance.The existing researches mainly focus on federated optimization,but most methods do not solve the problem of resource waste caused by the different computing time of each node in synchronous communication mode.In addition,most of the training nodes in federated learning are mobile devices,so the poor network environment leads to high communication overhead and serious network bottlenecks.Existing methods reduce the communication overhead by compressing the gradient uploaded by the training nodes,but inevitably bring the loss of model performance and it is difficult to achieve a good balance between quality and speed.To solve these problems,at the computing stage,this paper proposes adap-tive federated averaging(AFA),which adaptatively coordinates the local iteration according to the hardware performance of each node,minimizes the idle time of waiting for global gradient download and improves the computational efficiency of federated learning.In the communication stage,it proposes double sparsification(DS)to minimize the communication overhead by gradient sparsification on the training node and parameter server.In addition,each training node compensates the error according to the lost value of the local gradient and the global gra-dient,and reduces the communication cost greatly in exchange for lower model performance loss.Experimental results on the image classification dataset and the spatio-temporal prediction dataset prove that the proposed method can effectively improve the training acceleration ratio,and is also helpful to the model performance.
作者
冯晨
顾晶晶
FENG Chen;GU Jingjing(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《计算机科学》
CSCD
北大核心
2023年第11期317-326,共10页
Computer Science
基金
国家自然科学基金(62072235)。
关键词
联邦学习
分布式机器学习
并行计算
参数同步
稀疏表示
Federated learning
Distributed machine learning
Parallel computing
Parameter synchronization
Sparse representation