随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式...随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式训练方法与策略。针对这些问题,从三个方面综述大模型分布式训练方法在过去十几年里的进展,包含分布式训练的架构并行加速策略以及内存和计算优化方面的内容,最后提出了未来可以探索的研究方向。展开更多
In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel G...In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstract machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed. The performance statistics on a transputer array demonstrate the effectiveness of our model,parallel ab- stract machine,optimizing compilation strategies and compiler.展开更多
文摘随着ChatGPT的问世,各种大模型(Large Language Model,LLM)产品不断涌现,一个属于大模型的时代正在来临。然而,由于大模型面临着参数规模大、训练时间长的难点,现有传统机器学习模型训练方法并不适用于大模型的训练,亟需探索新的分布式训练方法与策略。针对这些问题,从三个方面综述大模型分布式训练方法在过去十几年里的进展,包含分布式训练的架构并行加速策略以及内存和计算优化方面的内容,最后提出了未来可以探索的研究方向。
基金This work was partially supported by the National 863 High Technical Grant 863-306-101the National Doctoral Subject Foundation Grant 0249136.
文摘In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstract machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed. The performance statistics on a transputer array demonstrate the effectiveness of our model,parallel ab- stract machine,optimizing compilation strategies and compiler.