摘要
大型神经网络训练是深度学习领域的一个热点话题,而分布式训练是基于多节点实现大型神经网络训练的最佳方法之一。分布式训练通常包含数据并行、层间并行和层内并行3种并行方法。然而现有的框架在层间并行时只能对模型进行手动切分,增加了模型设计的抽象复杂度,对此提出了节点约束关系搜索算法,实现了模型的自动切分。另外,在传统的数据并行和层间并行中,由于模型的复杂约束关系和通信操作的需要,计算和通信往往受到严格的序列化限制,为此引入了同步优化算法,实现了计算和通信的重叠,有效提高了整体训练的效率。实验对不同规模的GPT-2,AlexNet,VGG16和ResNet50模型进行训练,使用同步优化算法在6节点条件下可以将GPT2-XL,GPT2-LARGE和GPT2-MEDIUM模型的训练性能分别提升1.14倍、1.18倍和1.23倍,在1节点条件下将AlexNet,VGG16和ResNet50模型的训练性能分别提升1.31倍、1.14倍和1.03倍。实验结果表明,同步优化算法能够提升混合并行中的训练效率。
complexity of model design.To address this issue,we propose a node-constrained relationship search algorithm that automates the model partitioning process.Moreover,in traditional data parallelism and inter-layer parallelism,strict serialization limits the overlap of computation and communication due to complex model constraints and the need for communication operations.To overcome this challenge,we introduce a synchronous optimization algorithm,enabling the overlap of computation and communication and effectively enhancing the overall training efficiency.The experiments involve training GPT-2 of different sizes,AlexNet,VGG16,and ResNet50 models.Using the synchronous optimization algorithm under a 6-node configuration,the training performance of GPT2-XL,GPT2-LARGE,and GPT2-MEDIUM models is improved,achieving speed-ups of 1.14,1.18,and 1.23,respectively.Under 1-node configuration,performance enhancements are also observed for AlexNet,VGG16,and ResNet50 models,with speed-ups of 1.31,1.14,and 1.03,respectively.The experimental results indicate that the synchronous optimization algorithm effectively enhances the training efficiency in mixed parallelism.
作者
徐金龙
李鹏飞
李嘉楠
陈飙元
高伟
韩林
XU Jinlong;LI Pengfei;LI Jianan;CHEN Biaoyuan;GAO Wei;HAN Lin(National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450000,China;School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China;Strategic Support Force Information Engineering University,Zhengzhou 450000,China)
出处
《计算机科学》
CSCD
北大核心
2024年第12期120-128,共9页
Computer Science
基金
河南省重大科技专项(221100210600)。
关键词
分布式训练
混合并行
自动切分
通信优化
梯度同步
Distributed learning
Hybrid parallel
Automatic segmentation
Communication optimization
Gradient synchronization