摘要
近年来,大型语言模型的出现和发展对自然语言处理和人工智能领域产生了变革性影响.随着不断增大模型参数量和训练数据量,语言模型的文本建模困惑度以可预测的形式降低,在各类自然语言处理任务上的表现也持续提升.因此,增加语言模型的参数和数据规模成为提升系统智能水平富有前景的途径.首先回顾了大型语言模型的基本定义,从模型表现和算力需求的角度给出了“大型”语言模型的界定标准.其次,从数据、算法、模型3个维度梳理了大型语言模型的发展历程及规律,展示了不同阶段各个维度的规模化如何推动语言模型的发展.接着,考察了大型语言模型所表现出的涌现能力,介绍了思维链、情景学习和指令遵循等关键涌现能力的相关研究和应用现状.最后,展望了大型语言模型的未来发展和技术挑战.
In recent years,the emergence and development of large language models(LLMs)have revolutionized the field of natural language processing and even artificial intelligence.With the increasing number of model parameters and training data,the perplexity of language models decreases in a predictable manner,which implies the improvement of performance on various natural language processing tasks.Therefore,scaling up language models has been a promising way to improve the system intelligence.In this survey,we first review the definition and scope of LLMs and provide a scale standard to distinguish“large”language models from the perspectives of performance and computing.Then,we review the development and representative work of LLMs in three dimensions:data,algorithm,and model architecture,showing how up-scaling in these dimensions drives the development of LLMs at different stages.Next,we discuss the emergent abilities of LLMs and possible interpretations behind them.We highlight three key emergent abilities,i.e.,chain-of-thought prompting,in-context learning,and instruction-following,introducing their related advances and applications.Finally,we outline some potential directions and challenges of LLMs.
作者
舒文韬
李睿潇
孙天祥
黄萱菁
邱锡鹏
Shu Wentao;Li Ruixiao;Sun Tianxiang;Huang Xuanjing;Qiu Xipeng(School of Computer Science,Fudan University,Shanghai 200433)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2024年第2期351-361,共11页
Journal of Computer Research and Development
关键词
自然语言处理
神经网络
大型语言模型
预训练
对齐
natural language processing
neural networks
large language models
pre-training
alignment