摘要
纵向联邦学习在不泄露数据隐私的前提下,通过联合多方本地数据特征,共同训练目标模型,提高数据利用价值,受到业界公司和机构的广泛关注.在训练过程中,客户端上传的中间嵌入及服务器返回的梯度信息需要巨大的通信量,通信成本成为限制其实际应用的关键瓶颈.如何通过有效的算法设计减少通信量、提高通信效率成为当前研究的热点之一.本文针对纵向联邦学习通信效率问题,提出基于嵌入和梯度双向压缩的高效压缩算法,对客户端上传的嵌入表示,采用改进的稀疏化方法并结合缓存重用机制,对服务器分发的梯度信息,采用离散量化与哈夫曼编码结合的机制.实验结果表明,本文算法能够在准确率与无压缩场景保持相当的前提下,降低约85%的通信量,提高通信效率,减少整体训练时间.
Vertical federated learning improves the value of data utilization by combining local data features from multiple parties and jointly training the target model without leaking data privacy.It has received widespread attention from companies and institutions in the industry.During the training process,the intermediate embeddings uploaded by clients and the gradients returned by the server require a huge amount of communication,and thus the communication cost becomes a key bottleneck limiting the practical application of vertical federated learning.Consequently,current research focuses on designing effective algorithms to reduce the communication amount and improve communication efficiency.To improve the communication efficiency of vertical federated learning,this study proposes an efficient compression algorithm based on embedding and gradient bidirectional compression.For the embedding representation uploaded by the client,an improved sparsification method combined with a cache reuse mechanism is employed.For the gradient information distributed by the server,a mechanism combining discrete quantization and Huffman coding is used.Experimental results show that the proposed algorithm can reduce the communication volume by about 85%,improve communication efficiency,and reduce the overall training time while maintaining almost the same accuracy as the uncompressed scenario.
作者
张宇航
嵩天
ZHANG Yu-Hang;SONG Tian(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;School of Cyberspace Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
出处
《计算机系统应用》
2024年第10期190-197,共8页
Computer Systems & Applications
基金
国家重点研发计划(2022YFC3303500)。
关键词
纵向联邦学习
通信效率
嵌入压缩
梯度压缩
稀疏化
量化
vertical federated learning(VFL)
communication efficiency
embedding compression
gradient compression
sparsification
quantization