摘要
以BERT和GPT为代表的、基于超大规模文本数据的预训练语言模型能够充分利用大模型、大数据和大计算,使几乎所有自然语言处理任务性能都得到显著提升,在一些数据集上达到甚至超过人类水平,已成为自然语言处理的新范式。认为未来自然语言处理,乃至整个人工智能领域,将沿着“同质化”和“规模化”的道路继续前进,并将融入多模态数据、具身行为数据、社会交互数据等更多的“知识”源,从而为实现真正的通用人工智能铺平道路。
Pre-trained language models based on super-large-scale raw corpora,represented by BERT and GPT,can make full use of big models,big data,and big computing,which have significantly improved the performances of almost all-natural language processing tasks.The performances have reached or exceeded the human level on some datasets.Pre-trained language models have become a new para⁃digm for natural language processing.It is believed that in the future,natural language processing and even the entire field of artificial intelli⁃gence will continue to move forward along the path of“homogenization”and“scale”,and will integrate more sources of“knowledge”,such as multi-modal data,embodiment data,and social interaction data.Consequently,these methods will pave the way for achieving true gen⁃eral artificial intelligence.
作者
车万翔
刘挺
CHE Wanxiang;LIU Ting(Harbin Institute of Technology,Harbin 150001,China)
出处
《中兴通讯技术》
2022年第2期3-9,共7页
ZTE Technology Journal
关键词
人工智能
自然语言处理
预训练语言模型
同质化
artificial intelligence
natural language processing
pre-trained language model
homogenization