基于多阶段强化学习的多智能体协作决策

Multi-Stage Reinforcement Learning for Multi-Agent Collaborative Decision-Making

下载PDF

导出

摘要为了解决多智能体协同训练过程中的团队奖励稀疏导致样本效率低下、无法进行有效探索以及对参数敏感的问题,本研究在MAPPO算法的基础上引入了分阶段的思想,提出了基于多阶段强化学习的多智能体协同算法MSMAC。该算法将训练划分为2个阶段:一是构建基于进化策略优化的单智能体策略网络,二是对多智能体策略网络进行协同训练。在多智能体粒子环境下的实验结果表明,基于多阶段的强化学习算法不仅提升了协作性能,而且提高了样本的训练效率和模型的收敛速度。 In order to solve the problems of low sample efficiency caused by sparse team rewards,ineffective exploration and sensitivity to parameters in the process of multi-agent collaborative training,a multi-stage reinforcement learning for multi-agent collaboration algorithm is proposed.Based on the MAPPO algorithm,the algorithm introduces the idea of stages,and divides the training into two stages.The algorithm divides the training into the following two stages:one is to build a single agent strategy network based on evolutionary strategy optimization,and the other is to train multi-agent cooperatively.The experimental results in multi-agent particle environment show that the multi-stage reinforcement learning algorithm improves the cooperation performance,and improves the sample efficiency and model convergence speed.

作者孙畅夏昺灿李梓悦肖莹莹饶元 SUN Chang;XIA Bingcan;LI Ziyue;XIAO Yinyin;RAO Yuan(Xi’an Key Laboratory of Social Intelligence and Complex Data Processing,School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;Shaanxi Joint Key Laboratory for Artifact Intelligence,School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;State Key Laboratory of Intelligent Manufacturing System Technology,Beijing Institute of Electronic System Engineering,Beijing 100854,China)

机构地区西安市社会智能与复杂数据处理市重点实验室西安交通大学软件学院陕西省人工智能联合实验室西安交通大学软件学院复杂产品智能制造系统技术国家重点实验室北京电子工程总体研究所

出处《系统仿真技术》 2023年第3期205-211,共7页 System Simulation Technology

基金自然科学基金项目(U22B2036) 科技部重点研发计划项目(2019YFB2102300) 世界一流大学(学科)和特色发展引导专项基金(PY3A022)

关键词多智能体强化学习协作决策进化策略多阶段 multi-agent reinforcement learning collaborative decision-making evolutionary strategy multi-stage

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

系统仿真技术

2023年第3期

浏览历史

内容加载中请稍等...

基于多阶段强化学习的多智能体协作决策

相关作者

相关机构

相关主题

浏览历史