摘要
GPU虚拟化技术推动着云服务器业态演进,分布式机器学习在本地完成优化训练,由通信链路聚合结果数据,开始下一轮训练迭代。本文通过对分布式机器学习功能进行模块化划分,明确通信性能是制约其算力的关键。先对比分层同步算法与平面同步算法的通信性能,再以全局同步时间GST为表征参数,对比不同通信算法的优缺点、布置难度和适用场合。
GPU virtualization technology promotes the evolution of cloud server industry,distributed machine learning completes optimization training locally,aggregates the result data through the communication link,and starts the next round of training iteration.In this paper,it is clarified that communication performance is the key to restricting the computing power through the modular division of distributed machine learning functions.Firstly,the communication performance of the hierarchical synchronization algorithm and the planar synchronization algorithm is compared,and then the global synchronization time GST is used as the characterization parameter to compare the advantages and disadvantages,layout difficulty and application occasions of different communication algorithms.
作者
范亚娜
李媛
翟斌
Fan Ya-na;Li Yuan;Zhai Bin(Beijing Guodiantong Network Technology Co.,Ltd.,Beijing 100070,China)
出处
《科学与信息化》
2024年第17期46-48,共3页
Technology and Information
关键词
GPU
服务器
机器学习
通信频率
GPU
server
machine learning
communication frequency