期刊文献+

GPT-4对多模态大模型在多模态理解、生成、交互上的启发 被引量:6

Inspiration of GPT-4 on Multimodal Foundation Models in Multimodal Understanding,Generation,and Interaction
原文传递
导出
摘要 对话式聊天机器人ChatGPT以近乎摧枯拉朽的气势席卷社会,拨开了通用人工智能的曙光。ChatGPT的升级版GPT-4是个多模态大模型,它从单调的文本交互,升级为可以接受文本与图像组合的多模态输入,相比传统的单模态大模型,多模态大模型更加符合人类的多渠道感认知方式,能够应对更加复杂丰富的环境、场景和任务。GPT-4表明在多模态大模型中引入基于人类知识的自然语言理解与生成能力能够带来模型在多模态理解、生成、交互能力上的巨大提升。本文将介绍多模态大模型的概念、关键技术、近期进展和应用场景、GPT-4的技术特性,并重点探讨以GPT-4为代表的大语言模型对构建多模态大模型的几点启发。具体而言,将讨论如何充分利用大语言模型的语言能力,在多模态大模型的构建中,借助语言的帮助更好地感知理解世界、创作生成内容、与人和环境交互。 ChatGPT,a conversational chatbot,has swept across society with its almost unstoppable momentum,heralding the dawn of general artificial intelligence.Its upgraded version,GPT-4,is a multimodal large-scale model that goes beyond monotonous text interactions and can accept combinations of text and images as multimodal inputs.Compared to traditional unimodal foundation models,multimodal foundation models are more consistent with human cognitive processes that involve multiple channels,allowing them to adapt to more complex environments,scenes and tasks.GPT-4 demonstrates that incorporating natural language understanding and generation abilities into multimodal foundation models can greatly enhance the model's abilities in multimodal understanding,generation,and interaction.This article introduces the concept of multimodal foundation models,key technologies,recent advancements,and application scenarios.It also discusses the technical characteristics of GPT-4 and specifically explore several inspirations provided by large language models,such as GPT-4,for building multimodal foundation models.Specifically,it discusses how to fully leverage the language capabilities of large language models to better perceive and understand the world,generate creative content,and interact with humans and the environment in the construction of multimodal foundation models.
作者 刘静 郭龙腾 Jing Liu;Longteng Guo(Instituteof Automation,Chinese Academy of Sciences,Beijing 100190;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100090)
出处 《中国科学基金》 CSCD 北大核心 2023年第5期793-802,共10页 Bulletin of National Natural Science Foundation of China
基金 科技创新2030“新一代人工智能”重大项目(2022ZD0118801) 国家自然科学基金项目(U21B2043)的资助。
关键词 GPT-4 多模态大模型 多模态理解 多模态生成 多模态交互 GPT-4 multimodal foundation models multimodal understanding multimodal generation multimodal interaction
  • 相关文献

参考文献3

二级参考文献3

共引文献43

同被引文献104

引证文献6

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部