期刊文献+

基于异步合作更新的LSTM-MADDPG多智能体协同决策算法

LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update
原文传递
导出
摘要 针对完全合作型任务中,多智能体深度确定性策略梯度(MADDPG)算法存在信度分配以及训练稳定性差的问题,提出了一种基于异步合作更新的LSTM-MADDPG多智能体协同决策算法。基于差异奖励和值分解思想,利用长短时记忆(LSTM)网络提取轨迹序列间特征,优化全局奖励划分方法,实现各智能体的动作奖励分配;结合算法联合训练需求,构建高质量训练样本集,设计异步合作更新方法,实现LSTM-MADDPG网络的联合稳定训练。仿真结果表明,在协作捕获场景中,本文算法相较于QMIX的训练收敛速度提升了20.51%;所提异步合作更新方法相较于同步更新,归一化奖励值均方误差减小了57.59%,提高了算法收敛的稳定性。 In fully cooperative tasks,the MADDPG algorithm has credit assignment and poor stability of training problem.To address this problem,a LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update was proposed.According to the idea of Difference Reward and Value Decomposition,LSTM was used to extract the characteristics between trajectory sequences.The global reward division was optimized to realize the agent's reward distribution.In order to meet requirements of algorithm joint training,the high-quality training set was constructed.Then,the asynchronous cooperative update method was designed to joint train the LSTM-MADDPG network,and realize the cooperation of multi-agent.In cooperative capture scene,the simulation results show that the convergence speed of the proposed algorithm is increased by 20.51%compared with the QMIX.After the convergence of algorithm training,the update method of asynchronous cooperation reduces the mean square error of normalized reward value by 57.59%compared with synchronous update,which improves the stability of algorithm convergence.
作者 高敬鹏 王国轩 高路 GAO Jing-peng;WANG Guo-xuan;GAO Lu(College of Information and Communication Engineering,Harbin Engineering University,Harbin 150001,China;National Key Laboratory of Science and Technology on Test Physics and Numerical Mathematics,Beijing Institute of Space Long March Vehicle,Beijing 100076,China)
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2024年第3期797-806,共10页 Journal of Jilin University:Engineering and Technology Edition
基金 电子信息系统复杂电磁环境效应国家重点实验室项目(CEMEE2021G0001).
关键词 人工智能 多智能体协同决策 深度强化学习 信度分配 异步合作更新 artificial intelligence multi-agent coordination decision making deep reinforcement learning credit assignment update of asynchronous cooperation
  • 相关文献

参考文献3

二级参考文献27

共引文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部