摘要
为了缓解提供智能服务的AI模型训练流式数据存在模型性能差、训练效率低等问题,在具有隐私数据的分布式终端系统中,提出了一种面向异构流式数据的高性能联邦持续学习算法(FCL-HSD)。为了缓解当前模型遗忘旧数据问题,在本地训练阶段引入结构可动态扩展模型,并设计扩展审核机制,以较小的存储开销来保障AI模型识别旧数据的能力;考虑到终端的数据异构性,在中央节点侧设计了基于数据分布相似度的全局模型定制化策略,并为模型的不同模块执行分块聚合方式。在不同数据集下多种数据增量场景中验证了所提算法的可行性和有效性。实验结果证明,相较于现有工作,所提算法在保证模型对新数据具有分类能力的前提下,可以有效提升模型对旧数据的分类能力。
Aiming at the problems of poor model performance and low training efficiency in training streaming data of AI models that provide intelligent services,a high-performance federated continual learning algorithm for heterogeneous streaming data(FCL-HSD)was proposed in the distributed terminal system with privacy data.In order to solve the problem of the current model forgetting old data,a model with dynamically extensible structure was introduced in the local training stage,and an extension audit mechanism was designed to ensure the capability of the AI model to recognize old data at the cost of small storage overhead.Considering the heterogeneity of terminal data,a customized global model strategy based on data distribution similarity was designed at the central server side,and an aggregation-by-block manner was implemented for different modules of the model.The feasibility and effectiveness of the proposed algorithm were verified under various data increment scenarios with different data sets.Experimental results show that,compared with existing works,the proposed algorithm can effectively improve the model performance to classify old data on the premise of ensuring the capability to classify new data.
作者
姜慧
何天流
刘敏
孙胜
王煜炜
JIANG Hui;HE Tianliu;LIU Min;SUN Sheng;WANG Yuwei(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100190,China;Zhongguancun Laboratory,Beijing 100084,China)
出处
《通信学报》
EI
CSCD
北大核心
2023年第5期123-136,共14页
Journal on Communications
基金
国家重点研发计划基金资助项目(No.2021YFB2900102)
国家自然科学基金资助项目(No.62072436)。
关键词
异构数据
流式数据
联邦学习
联邦持续学习
灾难性遗忘
heterogeneous data
streaming data
federated learning
federated continual learning
catastrophic forgetting