期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
并行智能训练技术:挑战与发展 被引量:2
1
作者 卢凯 赖志权 +4 位作者 李笙维 柳炜杰 葛可适 卢锡城 李东升 《中国科学:信息科学》 CSCD 北大核心 2023年第8期1441-1468,共28页
近年来,以深度学习为代表的人工智能技术迅猛发展,深度学习模型和训练数据的规模均呈爆炸式增长,给智能模型训练系统带来了巨大挑战.随着高性能计算与人工智能的不断深度融合,并行智能训练技术成为大规模深度学习模型高效训练的主要方法... 近年来,以深度学习为代表的人工智能技术迅猛发展,深度学习模型和训练数据的规模均呈爆炸式增长,给智能模型训练系统带来了巨大挑战.随着高性能计算与人工智能的不断深度融合,并行智能训练技术成为大规模深度学习模型高效训练的主要方法.本文总结了并行智能训练的基本模式和关键技术,以及并行智能训练框架的发展现状,分析了并行智能训练技术和框架发展面临的挑战与发展趋势,简介了银河天璇并行智能训练框架的研究进展. 展开更多
关键词 智能训练 高性能计算 并行智能训练 深度学习
原文传递
Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs 被引量:1
2
作者 Yu Tang Zhigang Kan +4 位作者 Lujia Yin zhiquan lai Zhaoning Zhang Linbo Qiao Dongsheng Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期114-126,共13页
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training er... In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on CIFAR-10 by using Stochastic Gradient Descent(SGD) and Adaptive moment estimation(Adam) while keeping the total batch size in the parameter server constant and lowering the batch size on each Graphics Processing Unit(GPU).A new method that considers momentum to eliminate training errors in distributed training is proposed.We define a Momentum-like Factor(MF) to represent the influence of former gradients on parameter updates in each iteration.Then,we modify the MF values and conduct experiments to explore how different MF values influence the training performance based on SGD,Adam,and Nesterov accelerated gradient.Experimental results reveal that increasing MFs is a reliable method for reducing training errors in distributed training.The analysis of convergent conditions in distributed training with consideration of a large batch size and multiple GPUs is presented in this paper. 展开更多
关键词 multiple Graphics Processing Units(GPUs) batch size training error distributed training momentum-like factors
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部