摘要
针对在复杂军事化背景下多智能体决策算法探索效率低下、收敛缓慢的问题,提出了基于多头注意力机制和优先经验回放的多智能体深度确定性策略梯度算法(AP-MADDPG)。算法采用基于优先级的经验回放减少算法的训练时间;采用多头注意力机制在复杂的对抗环境中实现智能体之间的稳定、高效的合作竞争。实验结果表明,该算法可以使多智能体更加有效地学习联合策略,拥有更快的收敛速度和更好的稳定性,同时可以获得更高的回合奖励。
Aiming at the problem of low-efficiency exploration and slow convergence of multi-agent decision-making algorithms in the context of complex militarization,this paper proposes a multi-agent depth deterministic policy gradient algorithm(AP-MADDPG)based on muti-head-attention mechanism and prior experience replay mechanism.Priority experience replay replaces uniform sampling to reduce the training time of the algorithm;the multi-head attention mechanism is used to achieve stable and high-efficiency cooperation and competition between agents in a complex confrontation environment.Experimental results show that the algorithm can make multi-agent learn joint strategies more effectively.The algorithm has a faster convergence speed and better stability,and at the same time,obtains higher round rewards.
作者
龚慧雯
王桐
陈立伟
薛书钰
金鼎筌
GONG Huiwen;WANG Tong;CHEN Liwei;XUE Shuyu;JIN Dingquan(College of Information and Communication Engineering,Harbin Engineering University,Harbin 150001,China;Key Laboratory of Advanced Marine Communication and Information Technology,Ministry ofIndustry and Information Technology,Harbin Engineering University,Harbin 150001,China)
出处
《应用科技》
CAS
2022年第5期1-7,共7页
Applied Science and Technology
基金
国家自然科学基金项目(61102105)
国防科技重点实验室基金项目(6142209190107)
先进船舶通信与信息技术工业和信息化部重点实验室项目(AMCIT2101-08)
中央高校基本科研业务费项目(3072021CF0813)。
关键词
多智能体
强化学习
深度确定性策略
优先经验回放
多头注意力机制
智能决策
联合策略
合作与竞争
multi-agent
reinforcement learning
deep deterministic strategy
priority experience replay
multi-head attention
intelligent decision-making
joint strategy
cooperation and competition