基于数据生成的类别均衡联邦学习被引量：5

Class-Balanced Federated Learning Based on Data Generation

下载PDF

导出

摘要手机、可穿戴设备等终端设备每天产生海量数据,但这些数据往往涉及敏感隐私而不能直接公开并使用.为解决隐私保护下的机器学习问题,联邦学习应运而生,旨在通过构建协同训练机制,在不共享客户端数据条件下,训练高性能全局模型.然而,在实际应用中,现有联邦学习机制面临两大不足:(1)全局模型需考虑多个客户端的数据,但各客户端往往仅包含部分类别数据且类别间数据量严重不均衡,使得全局模型难以训练;(2)各客户端之间的数据分布往往存在较大差异,导致各客户端模型往往差异较大,使得传统通过模型参数加权平均以获得全局模型的方法难以奏效.为降低客户端类别不均衡和数据分布差异的影响,本文提出一种基于数据生成的类别均衡联邦学习(Class-Balanced Federated Learning,CBFL)方法.CBFL旨在通过数据生成技术,针对各客户端构造符合全局模型学习的类别均衡数据集.为此,CBFL设计了一个包含类别均衡采样器和数据生成器的类别分布均衡器.其中,类别均衡采样器对客户端数据量不足的类别以较高概率进行采样.然后,数据生成器则根据所采样的类别生成相应的虚拟数据以均衡客户端数据的类别分布并用于后续的模型训练.为验证所提出方法的有效性,本文在四个标准数据集上进行了大量实验.实验表明,本文方法可大幅提升联邦学习性能:如在CIFAR-100数据集上,CBFL训练的ResNet20模型与现有方法相比,分类准确率提高了5.82%. Modern terminal devices such as mobile phones and wearable devices produce massive amounts of data every day,but these data often involve sensitive privacy and thus cannot be di-rectly disclosed and used.To solve this problem,Federated Learning(FL)has been developed as an important machine learning framework under privacy protection,which allows extensive ter-minal devices/clients to collaboratively learn a superior global model,without sharing the private data on the clients.However,in practical application,there are still two underlying limitations to existing FL mechanism.First,the global model needs to consider the data on multiple clients,but each client usually contains only partial classes of data and the data amount of different clas-ses is severely imbalanced,making it difficult to train the global model.Specifically,most data on the client belong to a few classes,while other classes have few or no data.As a result,the trained local models tend to overfit the data on the clients and achieve poor performance on global data,which severely affects the training of the global model.Second,the data distribution is ex tremely different across the clients,which causes the trained models on each client to be quite dif ferent,making it hard to derive a promising global model.In fact,the training data on each cli ent usually come from the usage of the terminal device by a particular user.Due to the differences in the functions of the terminal devices and the usage habits of users,different clients often pro duce different classes of data,leading to extremely different class distribution across the data on the clients.Consequently,there will be huge differences among the local models trained on such distribution,making it difficult to obtain a superior global model through the traditional approach of element-wise weighted averaging model parameters.To reduce the impact of class imbalance and distribution differences,in this paper,we propose a novel Class Balanced Federated Learning(CBFL)method based on data generation,which aims to produce a class-balanced data set suitable for the training of global model for each client through data generation technique.To this end,CBFL designs a class distribution equalizer that consists of a class-balanced sampler and a data generator.First,the class-balanced sampler samples those classes that have insufficient data on the client with a higher sample probability.Then,the data generator generates corresponding dummy data according to the classes sampled by the class balanced sampler.Finally,each client combines its original data and the generated data to produce a class balanced data set for training.In this way,the performance of each local model can be greatly improved and the differences among local models are highly reduced,which contributes to obtaining a promising global model.Moreover,to obtain high-quality generated data,we exploit global data distribution information from the global model to train the data generator.Extensive experiments on four benchmark datasets demonstrate the superior performance of the proposed method over existing methods.For example,the ResNet20 model trained on CIFAR-100 dataset by the proposed CBFL outperforms existing methods by 5.82%in terms of accuracy.

作者李志鹏国雍陈耀佛王耀威曾炜谭明奎 LI Zhi-Peng;GUO Yong;CHEN Yao-Fo;WANG Yao-Wei;ZENG Wei;TAN Ming-Kui(School of Software Engineering,South China University of Technology,Guangzhou 510006;Artificial Intelligence Research Center,Peng Cheng Laboratory,Shenzhen,guangdong 518054;School of Electronics Engineering and Computer Science,Peking University,Beijing 100871)

机构地区华南理工大学软件学院鹏城实验室人工智能研究中心北京大学信息科学技术学院

出处《计算机学报》 EI CAS CSCD 北大核心 2023年第3期609-625,共17页 Chinese Journal of Computers

基金科技部青年项目(2020AAA0106900) 国家自然科学基金联合基金项目(U20B2052) 国家自然科学基金项目(62072190) 广东省重点领域研发计划项目(2018B010107001) 广东省珠江人才计划创新创业团队(2017ZT07X183)资助.

关键词联邦学习数据生成类别分布类别不均衡隐私保护 federated learning data generation class distribution class imbalance privacy protection

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献2

1冯永,张春平,强保华,张逸扬,尚家兴.GP-WIRGAN:梯度惩罚优化的Wasserstein图像循环生成对抗网络模型[J].计算机学报,2020,43(2):190-205. 被引量：8
2肖进胜,申梦瑶,雷俊锋,熊闻心,焦陈坤.基于生成对抗网络的雾霾场景图像转换算法[J].计算机学报,2020,43(1):165-176. 被引量：24

二级参考文献4

1王坤峰,苟超,段艳杰,林懿伦,郑心湖,王飞跃.生成式对抗网络GAN的研究进展与展望[J].自动化学报,2017,43(3):321-332. 被引量：324
2刘杰平,黄炳坤,韦岗.一种快速的单幅图像去雾算法[J].电子学报,2017,45(8):1896-1901. 被引量：18
3彭亚丽,张鲁,张钰,刘侍刚,郭敏.基于深度反卷积神经网络的图像超分辨率算法[J].软件学报,2018,29(4):926-934. 被引量：23
4肖进胜,田红,邹文涛,童乐,雷俊锋.基于深度卷积神经网络的双目立体视觉匹配算法[J].光学学报,2018,38(8):171-177. 被引量：33

共引文献30

1裴傲,陈桂芬,李昊玥,王兵.改进CGAN网络的光学遥感图像云去除方法[J].农业工程学报,2020,36(14):194-202. 被引量：10
2李姣,郭鹏.基于ARGAN表面阴影预处理与迁移学习风电机组叶片故障识别[J].华北电力大学学报（自然科学版）,2021,48(2):73-79. 被引量：7
3袁非牛,李志强,史劲亭,夏雪,李雅.两阶段特征提取策略的图像去雾[J].中国图象图形学报,2021,26(3):568-580. 被引量：5
4吴杰,段锦,董锁芹,李英超.DFM-GAN网络在跨年龄模拟的人脸识别技术研究[J].计算机工程与应用,2021,57(10):117-124. 被引量：3
5李玉峰,任静波,黄煜峰.基于深度学习的遥感图像去雾算法[J].计算机应用研究,2021,38(7):2194-2199. 被引量：10
6李泽,候蕊,董青梅,高晗晗,谢弟江.缂丝画稿粉本图像处理研究[J].信息与电脑,2021,33(10):41-44.
7陈雪云,许韬,黄小巧.基于条件生成对抗网络的医学细胞图像生成检测方法[J].吉林大学学报（工学版）,2021,51(4):1414-1419. 被引量：2
8高涛,刘梦尼,陈婷,王松涛,蒋硕.结合暗亮通道先验的远近景融合去雾算法[J].西安交通大学学报,2021,55(10):78-86. 被引量：16
9王同森,史勤忠,王得法,董硕,杨国为,于腾.基于光源区域自适应的夜间去雾方法[J].计算机科学,2021,48(S02):327-333. 被引量：1
10张浩,康海燕.基于特征优化生成对抗网络的在线交易反欺诈方法研究[J].郑州大学学报（理学版）,2022,54(1):69-74. 被引量：1

同被引文献22

1Li ZHANG,Ping ZHOU,He-da SONG,Meng YUAN,Tian-you CHAI.Multivariable Dynamic Modeling for Molten Iron Quality Using Incremental Random Vector Functional-link Networks[J].Journal of Iron and Steel Research International,2016,23(11):1151-1159. 被引量：4
2刘俊旭,孟小峰.机器学习的隐私保护研究综述[J].计算机研究与发展,2020,57(2):346-362. 被引量：66
3谭作文,张连福.机器学习隐私保护研究综述[J].软件学报,2020,31(7):2127-2156. 被引量：62
4纪守领,杜天宇,李进锋,沈超,李博.机器学习模型安全与隐私研究综述[J].软件学报,2021,32(1):41-67. 被引量：50
5代伟,李德鹏,杨春雨,马小平.一种随机配置网络的模型与数据混合并行学习方法[J].自动化学报,2021,47(10):2427-2437. 被引量：13
6刘艳,王田,彭绍亮,王国军,贾维嘉.基于边缘的联邦学习模型清洗和设备聚类方法[J].计算机学报,2021,44(12):2515-2528. 被引量：15
7顾兆军,刘婷婷,隋翯.一种ICS异常检测的优化GAN模型[J].西安电子科技大学学报,2022,49(2):173-181. 被引量：2
8李冠彬,张锐斐,陈超,林倞.基于旋转不变深度层次聚类网络的点云分析[J].软件学报,2022,33(11):4356-4378. 被引量：3
9王前进,代伟,陆群,辅小荣,马小平.一种随机配置网络软测量模型的稀疏学习方法[J].控制与决策,2022,37(12):3171-3182. 被引量：6
10赵建国,杨春雨.复杂工业过程非串级双速率组合分散运行优化控制[J].自动化学报,2023,49(1):172-184. 被引量：1

引证文献5

1李海洋,郭晶晶,刘玖樽,刘志全.隐私保护的拜占庭鲁棒联邦学习算法[J].西安电子科技大学学报,2023,50(4):121-131.
2崔腾,张海军,代伟.基于分布共识的联邦增量迁移学习[J].计算机学报,2024,47(4):821-841.
3侯泽超,董建刚.去中心化场景下的隐私保护联邦学习优化方法[J].计算机应用研究,2024,41(8):2419-2426.
4李红娇,王宝金,王朝晖,胡仁豪.基于模型相似度与本地损失的双重客户端选择算法[J].计算机工程,2024,50(8):153-164.
5沈哲远,杨珂珂,李京.基于双流神经网络的个性化联邦学习方法[J].计算机应用,2024,44(8):2319-2325.

1苏树智,张开宇,王子莹,张茂岩.基于跨视角相似度顺序保持的基因特征提取方法[J].电子与信息学报,2023,45(1):317-324. 被引量：1
2李欢欢.基于自适应混合结构的快速收敛函数链接人工神经网络算法研究[J].振动与冲击,2021,40(10):180-186. 被引量：2
3朱佳琪.优化学生线上学习的EEG注意力训练机制和实践[J].进展,2022(24):57-59.
4张莫.全面降准落地金融支持稳经济大盘加力[J].企业界,2022(17):8-9.
5郑明明,吴莎.《中国英语能力等级量表》应用研究的现状及启示[J].中国考试,2023(2):56-63. 被引量：7
6林清扬,陈晓方,谢永芳.基于残差卷积自注意力神经网络的铝电解过热度识别方法[J].东北大学学报（自然科学版）,2023,44(1):8-17. 被引量：1
7鄢澜,李思涵,肖毅,寇宇轩,刘敦虎,肖进.基于Metacost的客户信用评估半监督异构集成模型研究[J].中国管理科学,2022,30(12):211-221.
8邹海燕,高霞.基于文献计量的国内协同创新研究评述[J].内蒙古科技与经济,2022(20):108-111.
9韦泰丞,刘雁兵,陈浩,赵弘胤.基于加权分类和样本合成的卷烟图像精细识别[J].计算机与网络,2022,48(23):65-72.
10中美化学课标中模型学习要求对比及启示[J].福建基础教育研究,2023(1):131-131.

计算机学报

2023年第3期

浏览历史

内容加载中请稍等...

基于数据生成的类别均衡联邦学习被引量：5

参考文献2

二级参考文献4

共引文献30

同被引文献22

引证文献5

相关作者

相关机构

相关主题

浏览历史

基于数据生成的类别均衡联邦学习 被引量：5

参考文献2

二级参考文献4

共引文献30

同被引文献22

引证文献5

相关作者

相关机构

相关主题

浏览历史

基于数据生成的类别均衡联邦学习被引量：5