大小模型端云协同进化技术进展

Advances in edge-cloud collaboration and evolution for large-small models

导出

摘要生成式基座大模型正在引发人工智能领域的重大变革,在自然语言处理、多模态理解与内容合成等任务展现通用能力。大模型部署于云侧提供通用智能服务,但面临时延大、个性化不足等关键挑战,小模型部署于端侧捕捉个性化场景数据,但存在泛化性不足的难题。大小模型端云协同技术旨在结合大模型通用能力和小模型专用能力,以协同交互方式学习演化进而赋能下游垂直行业场景。本文以大语言模型和多模态大模型为代表,梳理生成式基座大模型的主流架构、典型预训练技术和适配微调等方法,介绍在大模型背景下模型剪枝、模型量化和知识蒸馏等大模型小型化关键技术的发展历史和研究近况,依据模型间协作目的及协同原理异同,提出大小模型协同训练、协同推理和协同规划的协同进化分类方法,概述端云模型双向蒸馏、模块化设计和生成式智能体等系列代表性新技术、新思路。总体而言,本文从生成式基座大模型、大模型小型化技术和大小模型端云协同方式3个方面探讨大小模型协同进化的国际和国内发展现状,对比优势和差距,并从应用前景、模型架构设计、垂直领域模型融合、个性化和安全可信挑战等层面分析基座赋能发展趋势。 Generative foundation models are facilitating significant transformations in the field of artificial intelligence.They demonstrate general artificial intelligence in diverse research fields,including natural language processing,multi⁃modal content understanding,imagery,and multimodal content synthesis.Generative foundation models often consist ofbillions or even hundreds of billions of parameters.Thus,they are often deployed on the cloud side to provide powerful andgeneral intelligent services.However,this type of service can be confronted with crucial challenges in practice,such ashigh latency induced by communications between the cloud and local devices,and insufficient personalization capabilitiesdue to the fact that servers often do not have access to local data considering privacy concerns.By contrast,low-complexitylightweight models are located at the edge side to capture personalized and dynamic scenario data.However,they may suf⁃fer from poor generalization.Large and lightweight(or large-small)model collaboration aims to integrate the general intelli⁃gence of large foundation models and the personalized intelligence of small lightweight models.This integration empowersdownstream vertical domain-specific applications through the interaction and collaboration of both types of intelligent mod⁃els.Large and small model collaboration has recently attracted increasing attention and becomes the focus of research anddevelopment in academia and industry.It has also been predicted to be an important trend in technology.We therefore tryto thoroughly investigate this area by highlighting recent progress and bringing potential inspirations for related research.Inthis study,we first overview representative large language models(LLMs)and large multimodal models.We focus on theirmainstream Transformer-based model architectures including encoder-only,decoder-only,and encoder-decoder models.Corresponding pre-training technologies such as next sentence prediction,sequence-to-sequence modeling,contrastivelearning,and parameter-efficient fine-tuning methods with representatives including low-rank adaptation and prompt tuningare also explored.We then review the development history and the latest advancement of model compression techniques,including model pruning,model quantization,and knowledge distillation in the era of foundation models.Based on the dif⁃ferences in terms of model collaboration purposes and mechanisms,we propose a new classification method and taxonomiesfor the large-small model collaboration study,namely,collaborative training,collaborative inference,and collaborativeplanning.Specifically,we summarize recent and representative methods that consist of dual-directional knowledge distilla⁃tion between large models at the cloud side and small models deployed at the edge side,modular design of intelligent mod⁃els that split functional models between the cloud and edge,and generative agents that collaborate to complete more com⁃plex tasks in an autonomous and intelligent manner.In collaborative training,a main challenge is dealing with the hetero⁃geneity in data distribution and model architectures between the cloud and client sides.Data privacy may also be a concernduring collaborative training,particularly in privacy sensitive cases.Despite much progress in collaborative inference,slic⁃ing and completing a complicated task in a collective way automatically remain challenging.Furthermore,the communica⁃tion costs between computing facilities might be another concern.Collective planning is a new paradigm that gains attentionwith the increasing study and promising progress of LLM-centric agents(LLM agents).This paradigm often involves mul⁃tiple LLM agents who compete or cooperate together to complete a challenging task.It often leverages emerging capabilitiessuch as in-context learning and chain-of-thoughts of LLMs to automatically dive a complicated task into several subtasks.By completing and assembling different subtasks,the global task can be conducted in a collaborative manner.This schemefinds diverse applications such as developing games and simulating social societies.However,it may suffer from drawbacksinherent in LLMs,including hallucination and adversarial vulnerabilities.Thus,more robust and reliable collaborativeplanning schemes remain to be investigated.In summary,this work surveys the large-small model collaboration techniquesfrom the perspectives of generative foundation models,model compression,and heterogeneous model collaboration viaLLM agents.This work also compares the advantages and disadvantages between international and domestic technologydevelopments in this research realm.We conclude that,although the gaps are narrowing between domestic and advancedinternational studies in this area,particularly for newly emerging LLM agents,we may still lack original and major break⁃throughs.Certain notable advantages of domestic progress are closely related to industrial applications due to its rich dataresources from industries.Therefore,the development of domain specific LLMs is advanced.In addition,this study envi⁃sions the applications of large-small model collaboration and discusses certain key challenges and promising directions inthis topic.1)The design of efficient model architectures includes developing new model architectures that can achieve lowcomplexity inference speed while maintaining efficient long-sequence modeling abilities as Transformers and further improv⁃ing the scalability of mixture-of-expert-based architectures.2)Current model compression methods are mainly designed forvision models.Thus,developing techniques specifically for LLMs and large multimodal models is important to preserve their emergent abilities during compression.3)Existing personalization methods specially focus on discriminative models,and due attention needs to be paid for efficient personalization for generative foundation models.4)Generative intelligenceoften suffers from fraudulent contents(e.g.,generated fake imagery,deepfake videos,and fake news)and different typesof attacks(e.g.,adversarial attacks,the jailing breaking attacks,and the Byzantine attacks).Thus,security and trust⁃worthy issues arise in their practical applications.Therefore,this study also advocates a deeper investigation of theseemerging security threats.Then,it develops effective defenses accordingly to countermeasure these crucial issues duringlarge-small model collaboration for empowering vertical domains more safely.

作者王永威沈弢张圣宇吴帆赵洲蔡海滨吕承飞马利庄杨承磊吴飞 Wang Yongwei;Shen Tao;Zhang Shengyu;Wu Fan;Zhao Zhou;Cai Haibin;Lyu Chengfei;Ma Lizhuang;Yang Chenglei;Wu Fei(Institute of Artificial Intelligence,Zhejiang University,Hangzhou 310058,China;Shanghai Institute for Advanced Study,Zhejiang University,Shanghai 201203,China;Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200241,China;School of Software Engineering,East China Normal University,Shanghai 200062,China;Taobao(China)Software Co.,Ltd.,Hangzhou 310023,China;School of Software,Shandong University,Jinan 250011,China)

机构地区浙江大学人工智能研究所浙江大学上海高等研究院上海交通大学计算机科学与工程系华东师范大学软件工程学院淘宝(中国)软件有限公司山东大学软件学院

出处《中国图象图形学报》 CSCD 北大核心 2024年第6期1510-1534,共25页 Journal of Image and Graphics

基金新一代人工智能国家科技重大专项(2022ZD0119100) 国家自然科学基金项目(62037001,62441605) 浙江省科技计划项目(2022C01044) 繁星科学基金项目(浙江大学)。

关键词生成式大模型大模型小型化大小模型协同进化端云协同进化生成式智能体生成式人工智能 generative foundation models model compression large-small model collaboration edge-cloud collabora⁃tion generative agents generative AI

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1隋晨红,王奥,周圣文,臧安康,潘云豪,刘颢,王海鹏.面向鲁棒学习的对抗训练技术综述[J].中国图象图形学报,2023,28(12):3629-3650. 被引量：1
2吴汉舟,张杰,李越,殷赵霞,张新鹏,田晖,李斌,张卫明,俞能海.人工智能模型水印研究进展[J].中国图象图形学报,2023,28(6):1792-1810. 被引量：8
3Henglai Wei,Hui Zhang,Kamal AI-Haddad,Yang Shi.Ensuring Secure Platooning of Constrained Intelligent and Connected Vehicles Against Byzantine Attacks:A Distributed MPC Framework[J].Engineering,2024,33(2):35-46. 被引量：1

二级参考文献10

1赵宏,常有康,王伟杰.深度神经网络的对抗攻击及防御方法综述[J].计算机科学,2022,49(S02):662-672. 被引量：8
2刘西蒙,谢乐辉,王耀鹏,李旭如.深度学习中的对抗攻击与防御[J].网络与信息安全学报,2020,6(5):36-53. 被引量：17
3Dan Zhang,Gang Feng,Yang Shi,Dipti Srinivasan.Physical Safety and Cyber Security Analysis of Multi-Agent Systems:A Survey of Recent Advances[J].IEEE/CAA Journal of Automatica Sinica,2021,8(2):319-333. 被引量：20
4郑钢,胡东辉,戈辉,郑淑丽.生成对抗网络驱动的图像隐写与水印模型[J].中国图象图形学报,2021,26(10):2485-2502. 被引量：6
5王翌妃,周杨铭,钱振兴,李晟,张新鹏.鲁棒视频水印研究进展[J].中国图象图形学报,2022,27(1):27-42. 被引量：6
6孙杉,张卫明,方涵,俞能海.中文水印字库的自动生成方法[J].中国图象图形学报,2022,27(1):262-276. 被引量：6
7吴翼腾,刘伟,于洪涛,操晓春.基于局部影响分析模型的图神经网络对抗攻击[J].电子与信息学报,2022,44(7):2576-2583. 被引量：1
8余正飞,闫巧,周鋆.面向网络空间防御的对抗机器学习研究综述[J].自动化学报,2022,48(7):1625-1649. 被引量：9
9李前,蔺琛皓,杨雨龙,沈超,方黎明.云边端全场景下深度学习模型对抗攻击和防御[J].计算机研究与发展,2022,59(10):2109-2129. 被引量：7
10袁珑,李秀梅,潘振雄,孙军梅,肖蕾.面向目标检测的对抗样本综述[J].中国图象图形学报,2022,27(10):2873-2896. 被引量：11

共引文献7

1郭晶晶,刘玖樽,马勇,刘志全,熊宇鹏,苗可,李佳星,马建峰.基于模型水印的联邦学习后门攻击防御方法[J].计算机学报,2024,47(3):662-676. 被引量：1
2王金伟,姜晓丽,谭贵峰,罗向阳.生成图水印的前沿研究与展望[J].网络空间安全科学学报,2024,2(1):50-62.
3陈可江,李帅,张卫明,俞能海.基于知识注入的大语言模型水印[J].网络空间安全科学学报,2024,2(1):63-71.
4郝国文,徐青,杨烨,孙延岭,潘伟峰,邢汉.基于网络水印技术的水电智能终端安全通信方法研究[J].电力信息与通信技术,2024,22(6):52-58.
5谭景轩,钟楠,郭钰生,钱振兴,张新鹏.深度神经网络模型水印研究进展[J].上海理工大学学报,2024,46(3):225-242.
6刘安安,苏育挺,王岚君,李斌,钱振兴,张卫明,周琳娜,张新鹏,张勇东,黄继武,俞能海.AIGC视觉内容生成与溯源研究进展[J].中国图象图形学报,2024,29(6):1535-1554. 被引量：1
7金彪,林翔,熊金波,尤玮婧,李璇,姚志强.基于水印技术的深度神经网络模型知识产权保护[J].计算机研究与发展,2024,61(10):2587-2606.

1黄婉,路正南.基于政府驱动的新能源汽车上下游企业合作创新演化博弈研究[J].经营与管理,2023(9):70-79.
2郭洋,刘家凤.基于演化博弈的联邦学习监督激励机制行为分析[J].河北企业,2024(2):65-67.
3刘逸涵,宋丫,苏鹏涛.面向深度嵌入的微型化计算系统[J].电脑编程技巧与维护,2024(3):17-19.
4孙士然,陈龙,孙泽坤,孙群英,李杰,张然,刘江.低轨卫星网络服务质量保障路由方法综述[J].移动通信,2024,48(1):65-70. 被引量：2
5李京仙,孙鹏飞.聚丙烯生产工艺技术的进展研究[J].中文科技期刊数据库（全文版）经济管理,2016(5):279-279.
6熊俊,甘文风,陈愚,陈柏汗,于乐.受端电网故障下真双极柔性直流输电系统电压协同控制方法[J].电网与清洁能源,2024,40(5):32-41.
7陈静茹,宋咪,宋杰,高远.老年髋部骨折患者早期肢体活动的研究进展[J].解放军医学院学报,2024,45(5):562-566.
8何杰杰.“双碳”背景下的纺织品化学检测关键技术[J].纺织检测与标准,2024,10(3):42-47.
9岳萌萌,邱泽晶,吴凯槟,王曦,王宇.面向规模化空调负荷的聚合调控过程研究[J].电力需求侧管理,2024,26(3):41-47. 被引量：1
10严丽霞,傅荣幸,王正禹,王久菊.温室气体与大气污染物协同治理研究浅析[J].浙江化工,2024,55(6):35-42.

中国图象图形学报

2024年第6期

浏览历史

内容加载中请稍等...

大小模型端云协同进化技术进展

参考文献3

二级参考文献10

共引文献7

相关作者

相关机构

相关主题

浏览历史