大模型分布式训练方法研究综述

A review of distributed training methods for large language models

下载PDF

导出

摘要随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式训练方法与策略。针对这些问题,从三个方面综述大模型分布式训练方法在过去十几年里的进展,包含分布式训练的架构并行加速策略以及内存和计算优化方面的内容,最后提出了未来可以探索的研究方向。 With the advent of ChatGPT,various Large Language Model(LLM)products are constantly emerging,and an era of large models is coming.However,due to the difficulties of large language models with large parameters and long training time,the existing traditional machine learning model training methods are not suitable for the training of large language models,and it is urgent to explore new distributed training methods and strategies.To solve these problems,this paper reviews the progress of distributed training methods for large models in the past decade from three aspects,including the parallel acceleration strategy of distributed training architecture and the optimization of memory and computation,and finally puts forward the research directions that can be explored in the future.

作者蒋丰泽 JIANG Fengze(School of Software Engineering,Shenzhen Institute of Information Technology,Shenzhen,Guangdong,China 518172)

机构地区深圳信息职业技术学院软件学院

出处《深圳信息职业技术学院学报》 2023年第6期9-15,共7页 Journal of Shenzhen Institute of Information Technology

关键词大模型分布式训练并行处理 large language model distributed training method parallel processing

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

深圳信息职业技术学院学报

2023年第6期

浏览历史

内容加载中请稍等...

大模型分布式训练方法研究综述

相关作者

相关机构

相关主题

浏览历史