摘要
针对多智能体深度确定性策略梯度算法(MADDPG)学习训练效率低、收敛速度慢的问题,研究MADDPG算法经验优先抽取机制,提出PES-MADDPG算法.首先,分析MADDPG算法的模型和训练方法;然后,改进多智能体经验缓存池,以策略评估函数误差和经验抽取训练频率为依据,设计优先级评估函数,以优先级作为抽取概率获取学习样本训练神经网络;最后,在合作导航和竞争对抗2类环境中进行6组对比实验,实验结果表明,经验优先抽取机制可提高MADDPG算法的训练速度,学习后的智能体具有更好的表现,同时对深度确定性策略梯度算法(DDPG)控制的多智能体训练具有一定的适用性.
In order to mitigate the problem of low efficiency and slow convergence of the multi-agent deep deterministic policy gradient(MADDPG)algorithm,the prioritized experience selection mechanism of MADDPG algorithm is studied and PES-MADDPG algorithm is proposed.Firstly,the model and the training method of the MADDPG algorithm are analyzed,the multi-agent experience buffer pool is ameliorated,and the priority evaluation function is designed based on the error of critic function and the training frequency of experience.The priority is treated as the selection probability to obtain the learning sample for training neural network.Finally,six groups of comparative experiments are conducted in both cooperative navigation and competitive environment.The experiments results show that the prioritized experience selection mechanism improves the training speed of the MADDPG algorithm,and the trained agents have better performance.The prioritized experience selection mechanism also has certain applicability to the training of multi-agents controlled by the deep detcrministic policy gradient(DDPG)algorithm.
作者
何明
张斌
柳强
陈希亮
杨铖
HE Ming;ZHANG Bin;LIU Qiang;CHEN Xi-liang;YANG Cheng(College of Command and Control Engineering,The Army Engineering University of PLA,Nanjing 210007,China;Naval Command College,Nanjing 210000,China)
出处
《控制与决策》
EI
CSCD
北大核心
2021年第1期68-74,共7页
Control and Decision
基金
国家重点研发计划项目(2018YFC0806900,2016YFC0800606,2016YFC0800310)
江苏省自然科学基金项目(BK20161469)
江苏省重点研发计划项目(BE2016904,BE2017616,BE2018754)
中国博士后基金项目(2018M633757).
关键词
多智能体
深度强化学习
MADDPG
经验优先抽取
multi-agent
deep reinforcement learning
MADDPG
prioritized experience selected method