摘要
当前多智能体追逃博弈问题通常在二维平面下展开研究,且逃逸方智能体运动不受约束,同时传统方法在缺乏准确模型时存在设计控制策略困难的问题。针对三维空间中逃逸方智能体运动受约束的情况,提出了一种基于深度Q网络(DQN)的多智能体逃逸算法。该算法采用分布式学习的方法,逃逸方智能体通过对环境的探索学习得到满足期望的逃逸策略。为提高学习效率,根据任务的难易程度将智能体策略学习划分为两个阶段,并设计了相应的奖励函数引导智能体探索满足期望的逃逸策略。仿真结果表明,该算法所得逃逸策略效果稳定,并且具有泛化能力,在改变一定的初始位置条件后,逃逸方智能体也可成功逃逸。
At present, the problem of multi-agent pursuit-evasion game is usually studied in the two-dimensional plane, and the movement of the evader is not constrained. At the same time, one problem is that it is difficult for traditional methods to design control strategy without accurate model. Therefore, this paper proposes a multi-agent evasion algorithm based on deep Q-network when the motion of evader is constrained in three-dimensional space. The proposed algorithm is a decentralized algorithm, and the evader obtains the desired evasive strategy by exploring and learning the environment. In order to improve the learning efficiency, the agent strategy learning is divided into two stages according to the difficulty of the task, and the corresponding reward function is designed to guide the agent to explore the desired evasive strategy. The simulation results show that the effect of the evasive strategy obtained by the algorithm is stable, and the algorithm has generalization ability, and the evader can successfully evade after changing certain initial position conditions.
作者
闫博为
杜润乐
班晓军
周荻
YAN Bo-wei;DU Run-le;BAN Xiao-jun;ZHOU Di(The School of Astronautics,University of the Harbin Institute of Technology,Harbin 150000,China;National Key Laboratory of Science and Technology on Test Physics and Numerical Mathematics,Beijing 100076,China)
出处
《导航定位与授时》
CSCD
2022年第6期40-47,共8页
Navigation Positioning and Timing
关键词
逃逸算法
深度强化学习
多智能体
深度Q网络
Evasion algorithm
Deep reinforcement learning
Multi-agent
Deep Q-Network