摘要
联邦学习(FL)是一种新的分布式机器学习范式,它在保护设备数据隐私的同时打破数据壁垒,从而使各方能在不共享本地数据的前提下协作训练机器学习模型。然而,如何处理不同客户端的非独立同分布(Non-IID)数据仍是FL面临的一个巨大挑战,目前提出的一些解决方案没有利用好本地模型和全局模型的隐含关系,无法简单而高效地解决问题。针对FL中不同客户端数据的Non-IID问题,提出新的FL优化算法——联邦自正则(FedSR)和动态联邦自正则(Dyn-FedSR)。FedSR在每一轮训练过程中引入自正则化惩罚项动态修改本地损失函数,并通过构建本地模型和全局模型的关系来让本地模型靠近聚合丰富知识的全局模型,从而缓解Non-IID数据带来的客户端偏移问题;Dyn-FedSR则在FedSR基础上通过计算本地模型和全局模型的相似度来动态确定自正则项系数。对不同任务进行的大量实验分析表明,FedSR和Dyn-FedSR这两个算法在各种场景下的表现都明显优于联邦平均(FedAvg)算法、联邦近端(FedProx)优化算法和随机控制平均算法(SCAFFOLD)等FL算法,能够实现高效通信,正确率较高,且对不平衡数据和不确定的本地更新具有鲁棒性。
Federated Learning(FL)is a new distributed machine learning paradigm that breaks down data barriers and protects data privacy at the same time,thereby enabling clients to collaboratively train a machine learning model without sharing local data.However,how to deal with Non-Independent Identical Distribution(Non-IID)data from different clients remains a huge challenge faced by FL.Some existing proposed solutions to this problem do not utilize the implicit relationship between local and global models to solve the problem simply and efficiently.To address the Non-IID issue of different clients in FL,novel FL optimization algorithms including Federated Self-Regularization(FedSR)and Dynamic Federated Self-Regularization(Dyn-FedSR)were proposed.In FedSR,self-regularization penalty terms were introduced in each training round to modify the local loss function dynamically,and by building a relationship between the local and the global models,the local model was closer to the global model that aggregates rich knowledge,thereby alleviating the client drift problem caused by Non-IID data.In Dyn-FedSR,the self-regularization term coefficient was determined dynamically by calculating the similarity between the local and global models.Extensive experimental analyses on different tasks demonstrate that the two algorithms,FedSR and Dyn-FedSR,significantly outperform the state-of-the-art FL algorithms such as Federated Averaging(FedAvg)algorithm,Federated Proximal(FedProx)optimization algorithm and Stochastic Controlled Averaging algorithm(SCAFFOLD)in various scenarios,and can achieve efficient communication and high accuracy,as well as the robustness to imbalanced data and uncertain local updates.
作者
蓝梦婕
蔡剑平
孙岚
LAN Mengjie;CAI Jianping;SUN Lan(College of Computer and Data Science,Fuzhou University,Fuzhou Fujian 350108,China)
出处
《计算机应用》
CSCD
北大核心
2023年第7期2073-2081,共9页
journal of Computer Applications
关键词
联邦学习
非独立同分布
客户端偏移
正则化
分布式机器学习
隐私保护
Federated Learning(FL)
Non-Independent Identical Distribution(Non-IID)
client drift
regularization
distributed machine learning
privacy-preserving