期刊文献+

大模型分布式训练方法研究综述

A review of distributed training methods for large language models
下载PDF
导出
摘要 随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式训练方法与策略。针对这些问题,从三个方面综述大模型分布式训练方法在过去十几年里的进展,包含分布式训练的架构并行加速策略以及内存和计算优化方面的内容,最后提出了未来可以探索的研究方向。 With the advent of ChatGPT,various Large Language Model(LLM)products are constantly emerging,and an era of large models is coming.However,due to the difficulties of large language models with large parameters and long training time,the existing traditional machine learning model training methods are not suitable for the training of large language models,and it is urgent to explore new distributed training methods and strategies.To solve these problems,this paper reviews the progress of distributed training methods for large models in the past decade from three aspects,including the parallel acceleration strategy of distributed training architecture and the optimization of memory and computation,and finally puts forward the research directions that can be explored in the future.
作者 蒋丰泽 JIANG Fengze(School of Software Engineering,Shenzhen Institute of Information Technology,Shenzhen,Guangdong,China 518172)
出处 《深圳信息职业技术学院学报》 2023年第6期9-15,共7页 Journal of Shenzhen Institute of Information Technology
关键词 大模型 分布式训练 并行处理 large language model distributed training method parallel processing

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部