期刊文献+

基于混合并行的分布式训练优化研究

Study on Distributed Training Optimization Based on Hybrid Parallel
下载PDF
导出
摘要 大型神经网络训练是深度学习领域的一个热点话题,而分布式训练是基于多节点实现大型神经网络训练的最佳方法之一。分布式训练通常包含数据并行、层间并行和层内并行3种并行方法。然而现有的框架在层间并行时只能对模型进行手动切分,增加了模型设计的抽象复杂度,对此提出了节点约束关系搜索算法,实现了模型的自动切分。另外,在传统的数据并行和层间并行中,由于模型的复杂约束关系和通信操作的需要,计算和通信往往受到严格的序列化限制,为此引入了同步优化算法,实现了计算和通信的重叠,有效提高了整体训练的效率。实验对不同规模的GPT-2,AlexNet,VGG16和ResNet50模型进行训练,使用同步优化算法在6节点条件下可以将GPT2-XL,GPT2-LARGE和GPT2-MEDIUM模型的训练性能分别提升1.14倍、1.18倍和1.23倍,在1节点条件下将AlexNet,VGG16和ResNet50模型的训练性能分别提升1.31倍、1.14倍和1.03倍。实验结果表明,同步优化算法能够提升混合并行中的训练效率。 complexity of model design.To address this issue,we propose a node-constrained relationship search algorithm that automates the model partitioning process.Moreover,in traditional data parallelism and inter-layer parallelism,strict serialization limits the overlap of computation and communication due to complex model constraints and the need for communication operations.To overcome this challenge,we introduce a synchronous optimization algorithm,enabling the overlap of computation and communication and effectively enhancing the overall training efficiency.The experiments involve training GPT-2 of different sizes,AlexNet,VGG16,and ResNet50 models.Using the synchronous optimization algorithm under a 6-node configuration,the training performance of GPT2-XL,GPT2-LARGE,and GPT2-MEDIUM models is improved,achieving speed-ups of 1.14,1.18,and 1.23,respectively.Under 1-node configuration,performance enhancements are also observed for AlexNet,VGG16,and ResNet50 models,with speed-ups of 1.31,1.14,and 1.03,respectively.The experimental results indicate that the synchronous optimization algorithm effectively enhances the training efficiency in mixed parallelism.
作者 徐金龙 李鹏飞 李嘉楠 陈飙元 高伟 韩林 XU Jinlong;LI Pengfei;LI Jianan;CHEN Biaoyuan;GAO Wei;HAN Lin(National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450000,China;School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China;Strategic Support Force Information Engineering University,Zhengzhou 450000,China)
出处 《计算机科学》 CSCD 北大核心 2024年第12期120-128,共9页 Computer Science
基金 河南省重大科技专项(221100210600)。
关键词 分布式训练 混合并行 自动切分 通信优化 梯度同步 Distributed learning Hybrid parallel Automatic segmentation Communication optimization Gradient synchronization
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部