摘要
为了解决多智能体协同训练过程中的团队奖励稀疏导致样本效率低下、无法进行有效探索以及对参数敏感的问题,本研究在MAPPO算法的基础上引入了分阶段的思想,提出了基于多阶段强化学习的多智能体协同算法MSMAC。该算法将训练划分为2个阶段:一是构建基于进化策略优化的单智能体策略网络,二是对多智能体策略网络进行协同训练。在多智能体粒子环境下的实验结果表明,基于多阶段的强化学习算法不仅提升了协作性能,而且提高了样本的训练效率和模型的收敛速度。
In order to solve the problems of low sample efficiency caused by sparse team rewards,ineffective exploration and sensitivity to parameters in the process of multi-agent collaborative training,a multi-stage reinforcement learning for multi-agent collaboration algorithm is proposed.Based on the MAPPO algorithm,the algorithm introduces the idea of stages,and divides the training into two stages.The algorithm divides the training into the following two stages:one is to build a single agent strategy network based on evolutionary strategy optimization,and the other is to train multi-agent cooperatively.The experimental results in multi-agent particle environment show that the multi-stage reinforcement learning algorithm improves the cooperation performance,and improves the sample efficiency and model convergence speed.
作者
孙畅
夏昺灿
李梓悦
肖莹莹
饶元
SUN Chang;XIA Bingcan;LI Ziyue;XIAO Yinyin;RAO Yuan(Xi’an Key Laboratory of Social Intelligence and Complex Data Processing,School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;Shaanxi Joint Key Laboratory for Artifact Intelligence,School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;State Key Laboratory of Intelligent Manufacturing System Technology,Beijing Institute of Electronic System Engineering,Beijing 100854,China)
出处
《系统仿真技术》
2023年第3期205-211,共7页
System Simulation Technology
基金
自然科学基金项目(U22B2036)
科技部重点研发计划项目(2019YFB2102300)
世界一流大学(学科)和特色发展引导专项基金(PY3A022)
关键词
多智能体
强化学习
协作决策
进化策略
多阶段
multi-agent
reinforcement learning
collaborative decision-making
evolutionary strategy
multi-stage