期刊文献+

在不平衡数据中进行高效通信的联邦学习 被引量:1

Communication-efficient Federated Learning from Imbalanced Data
下载PDF
导出
摘要 联邦学习(FL)是一种分布式的机器学习方法,它通过中心服务器汇总各个移动终端在本地训练的机器学习模型,使得多个参与方能够协作进行高效率的机器学习。同时,FL不需要将终端的私人数据发送至中心服务器,从而保护了数据隐私。但是与普通的训练数据集不同,终端系统中的数据分布不平衡,这将导致FL的通信效率下降。针对该问题,提出了一种基于数据分布加权聚合的FL算法。通过计算参与方的本地数据集与平衡数据集之间的海林格距离对本地数据集的平衡程度进行了量化,并据此调整了参与方在聚合时的权重,以减少算法收敛或达到目标准确率所需的通信回合。提出的算法利用公开数据集进行了仿真实验。实验结果表明,其与最新的算法联邦平均相比,通信成本降低了14.6%以上,有效提升了数据不平衡时FL的通信效率。 Federated learning(FL)is a distributed machine learning method that aggregates machine learning models trained locally by various mobile terminals through a central server,so that multiple participants can collaborate in high-efficiency machine learning.At the same time,FL does not need to send the private data of the terminal to the central server,thereby protecting data privacy.But different from the ordinary training data set,the data distribution in the terminal system is not balanced,which will lead to the decrease of the communication efficiency of FL.To solve this problem,a FL algorithm based on weighted aggregation of data distribution is proposed.The balance of the local data set is quantified by calculating the Hellinger distance between the local data set of the participants and the balanced data set,and the weight of the participants during aggregation is adjusted accordingly to reduce the algorithm convergence or achieve the goal the communication round required for accuracy.The proposed algorithm uses public data sets to conduct simulation experiments.The experimental results show that compared with the latest algorithm Federated Averaging,the communication cost is reduced by more than 14.6%,which effectively improves the communication efficiency of FL when the data is imbalanced.
作者 舒志鸿 沈苏彬 SHU Zhi-hong;SHEN Su-bin(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210046,China;National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210046,China)
出处 《计算机技术与发展》 2021年第12期33-38,共6页 Computer Technology and Development
基金 中国通信标准化国际标准制订项目(2018外122)。
关键词 联邦学习 机器学习 不平衡数据 海林格距离 聚合 federated learning machine learning imbalanced data Hellinger distance aggregation
  • 相关文献

参考文献2

二级参考文献35

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 3Han J,Kamber M,Pei J.Data mining:Concepts and Techniques [J].Data Mining Concepts Models Methods & Algorithms Se-cond Edition,2000,5(4):1-18.
  • 4Anderberg M R.Cluster Analysis for Applications[M]∥Probability and Mathematical Statistics:A Serices of Monographs and Textbooks.1973:ibc1-ibc2.
  • 5Gan G,Ma C,Wu J.Data clustering:theory,algorithms,and applications[M]∥ Data Clustering:theory,algorithms,and applications.Society for Industrial and Applied Mathematics,American Statistical Association,2007:44-51.
  • 6Hanneman R A,Riddle M.Introduction to social network methods[D].Department of Sociology,University of California Ri-verside,2005.
  • 7Boriah S,Chandola V,Kumar V.Similarity measures for categorical data:A comparative evaluation [J].Proceedings of the 2008 SIAM International Conference on Data Mining,2008,30(2):243-254.
  • 8Huang Z.A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining[C]∥DMKD.1998:1-8.
  • 9Stanfill C,Waltz D.Toward memory-based reasoning [J].Communications of the ACM,1986,29(12):1213-1228.
  • 10Cost S,Salzberg S.A weighted nearest neighbor algorithm for learning with symbolic features[J].Machine Learning,1993,10(1):57-78.

共引文献67

同被引文献6

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部