竞争与合作视角下的多Agent强化学习研究进展

RECENT PROCESS AND PROSPECT OF MULTI-AGENT REINFORCEMENT LEARNING UNDER THE PERSPECTIVE OF COMPETITION AND COOPERATION

下载PDF

导出

摘要随着深度学习和强化学习研究取得长足的进展,多Agent强化学习已成为解决大规模复杂序贯决策问题的通用方法。为了推动该领域的发展,从竞争与合作的视角收集并总结近期相关的研究成果。该文介绍单Agent强化学习;分别介绍多Agent强化学习的基本理论框架——马尔可夫博弈以及扩展式博弈,并重点阐述了其在竞争、合作和混合三种场景下经典算法及其近期研究进展;讨论多Agent强化学习面临的核心挑战——环境的不稳定性,并通过一个例子对其解决思路进行总结与展望。 With the rapid development of deep learning and reinforcement learning,multi-agent reinforcement learning(MARL)has become a common approach to solve the large scale complex sequential decision-making problem.In order to promote the development of this field,this paper collects and reviews recent research results from the perspective of competition and cooperation.This paper introduced deep reinforcement learning and introduced the basic theoretical framework of MARL-Markov game and extensive game,and especially emphasized the reinforcement learning algorithms developed recently in three scenarios of competition,cooperation and mixture.This paper discussed the core challenge of MARL that was non-stationary of the environment,and an example was given to summarize and prospect its solutions.

作者田小禾李伟许铮刘天星戚骁亚甘中学 Tian Xiaohe;Li Wei;Xu Zheng;Liu Tianxing;Qi Xiaoya;Gan Zhongxue(Academy for Engineering and Technology,Fudan University,Shanghai 200433,China;Shanghai Engineering Research Center of AI&Robotics,Shanghai 200433,China;Engineering Research Center of AI&Robotics,Ministry of Education,Shanghai 200433,China;Ji Hua Laboratory,Foshan 528000,Guangdong,China;Beijing Deep Singularity Technology Co.,Ltd.,Beijing 100089,China)

机构地区复旦大学工程与应用技术研究院上海智能机器人工程技术研究中心智能机器人教育部工程研究中心季华实验室北京深度奇点科技有限公司

出处《计算机应用与软件》北大核心 2024年第4期1-15,共15页 Computer Applications and Software

基金广东省季华实验室基金项目(X190021TB190) 上海市科学技术委员会项目(1951113200)。

关键词深度学习强化学习多AGENT强化学习环境的不稳定性 Deep learning Reinforcement learning Multi-agent reinforcement learning Non-stationary of the environment

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献3

1孙彧,曹雷,陈希亮,徐志雄,赖俊.多智能体深度强化学习研究综述[J].计算机工程与应用,2020,56(5):13-24. 被引量：66
2赖俊,魏竞毅,陈希亮.分层强化学习综述[J].计算机工程与应用,2021,57(3):72-79. 被引量：15
3原魁,李园,房立新.多移动机器人系统研究发展近况[J].自动化学报,2007,33(8):785-794. 被引量：73

二级参考文献99

1Kloder S,Bhattacharya S,Hutchinson S.A configuration space for permutation-invariant multi-robot formations.In:Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2004.2746-2751
2Shao J Y,Xie G,Yu J Z,Wang L.A tracking controller for motion coordination of multiple mobile robots.In:Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2005.1331-1336
3Matsuo Y,Tamura Y.Tree formation multi-robot system for victim search in a devastated indoor space.In:Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.2004.1071-1076
4Wang Z D,Hirata Y,Kosuge K.Control a rigid caging formation for cooperative object transportation by multiple mobile robots.In:Proceedings of the IEEE International Conference on Robotics and Automation.2004.1580-1585
5Yamakita M,Saito M.Fromation control of SMC with multiple coordiante systems.In:Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2004.1023-1028
6Chio T S,Tarn T J.Rules and control strategies of multirobot team moving in hierarchical formation.In:Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2003.2701-2706
7Hidaka Y S,Mourikis A I,Roumeliotis S I.Optimal formations for cooperative localization of mobile robots.In:Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2005.4137-4142
8Li Y M,Chen X.Stability on multi-robot formation with dynamic interaction topologies.In:Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2005.1325-1330
9Furukawa T.Time-optimal cooperative control of multiple robot vehicles.In:Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2003.944-950
10Lawton J,Young B,Beard R.A decentralized approach to elementary formation manoeuvres.In:Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2000.2728-2733

共引文献151

1陈震,李长友,邹湘军.农业多机器人系统的支撑技术与研究进展[J].华中农业大学学报,2007,26(6):914-919. 被引量：4
2黎萍,杨宜民.多机器人系统任务分配的研究进展[J].计算机工程与应用,2008,44(17):201-205. 被引量：13
3李可佳,丁希仑.用于星球探测的多机器人任务规划技术[J].机器人技术与应用,2008(3):37-41.
4于立萍,姚文韬,何克忠.移动机器人遥控驾驶系统的设计与实现[J].山东科技大学学报（自然科学版）,2008,27(6):51-56. 被引量：3
5石杏喜,赵春霞,郭剑辉.多机器人导航中VRS的构建及定位算法研究[J].计算机工程与应用,2009,45(5):200-202. 被引量：1
6刘毅华,褚如元.多机器人系统中的机器人运动控制[J].计算机工程与应用,2009,45(8):72-75. 被引量：1
7任孝平,蔡自兴,卢薇薇.网络可重构的多机器人仿真系统[J].计算机应用研究,2009,26(6):2285-2287. 被引量：2
8蔡自兴,任孝平,邹磊.分布式多机器人通信仿真系统[J].智能系统学报,2009,4(4):309-313. 被引量：3
9石志国,王志良,刘冀伟.异构多机器人协作系统研究进展[J].智能系统学报,2009,4(5):377-391. 被引量：9
10石志国,王志良,刘冀伟,张晓星.基于周期时间限制的多机器人自主委托协作模型[J].机器人,2010,32(1):109-118. 被引量：6

1张尊栋,王岩楠,刘雨珂,刘小明,尚春琳.基于Nash-Stackelberg分层博弈模型的路网交通控制强化学习算法[J].东南大学学报（自然科学版）,2023,53(2):334-341. 被引量：2
2刘阳,周笛,盛敏,李建东,郝时光,郑晓天.面向巨型星座系统的多地面站协同测控技术[J].天地一体化信息网络,2023,4(1):2-11. 被引量：3
3朱留财.全球气候治理进入历史性阶段[J].中华环境,2024(2):68-70.
4王俊卿,杨艳平.科学课中的接力计算[J].少年发明与创造（小学版）,2024(7):32-33.
5廖晨阳,于劲松,乐祥立.基于深度强化学习的办公流程任务分配优化[J].北京航空航天大学学报,2024,50(2):487-498.
6于琰.基于流量镜像的通信网络入侵攻击智能防御方法[J].电脑与电信,2023(11):78-81. 被引量：1
7罗俊仁,张万鹏,苏炯铭,魏婷婷,陈璟.计算机博弈中序贯不完美信息博弈求解研究进展[J].控制与决策,2023,38(10):2721-2748. 被引量：3
8张书婧,贾顺平,彭芃,毛保华.基于量子反应均衡的政府与共享单车企业停车管理博弈分析[J].控制与决策,2024,39(2):641-648.
9顾炜.寻求自主性:大国竞合之间的中亚地区合作与身份建构[J].外交评论（外交学院学报）,2024,41(2):105-130. 被引量：2
10范擎宇,杨山,刘帅宾.基于区域一体化的长三角地区城市竞合空间演化研究[J].自然资源学报,2024,39(4):929-941.

计算机应用与软件

2024年第4期

浏览历史

内容加载中请稍等...

竞争与合作视角下的多Agent强化学习研究进展

参考文献3

二级参考文献99

共引文献151

相关作者

相关机构

相关主题

浏览历史