摘要
针对近距空战下的自主机动反追击问题,建立了无人机反追击马尔科夫(Markov)决策过程模型;在此基础上,提出了一种采用深度强化学习的无人机反追击自主机动决策方法。新方法基于经验回放区重构,改进了双延迟深度确定性策略梯度(TD3)算法,通过拟合策略函数与状态动作值函数,生成最优策略网络。仿真实验表明,在随机初始位置/姿态条件下,与采用纯追踪法的无人机对抗,该方法训练的智能无人机胜率超过93%;与传统的TD3、深度确定性策略梯度(DDPG)算法相比,该方法收敛性更快、稳定性更高。
In view of the problem of autonomous maneuvering counter-pursuing in close air combat,a Markov decision-making process model for UAV counter-pursuing is established,and for the above-mentioned reasons,an autonomous maneuvering decision-making method for unmanned aerial vehicles(UAVs)based on deep reinforcement learning is proposed.The new method is based on the empirical replay area reconstruction,and improves the Twin Delayed Deep Deterministic policy gradient(TD3)algorithm,and generates the optimal strategy network by fitting the strategy function and the state action value function.The simulation experiments show that under condition of random initial position/attitude,being confronted with the drones adopted by the pure pursuit methods,the winning rate of intelligent drones trained by this method exceeds 93%.Compared with traditional TD3 and Deep Deterministic policy gradient(DDPG),this method is faster at convergence and higher in stability.
作者
郭万春
解武杰
尹晖
董文瀚
GUO Wanchun;XIE Wujie;YIN Hui;DONG Wenhan(Aeronautical Engineering College,Air Force Engineering University,Xi’an 710038,China;Teaching&Reseerch Support Center,Air Force Engcieeng Univerity)
出处
《空军工程大学学报(自然科学版)》
CSCD
北大核心
2021年第4期15-21,共7页
Journal of Air Force Engineering University(Natural Science Edition)
关键词
深度强化学习
近距空战
无人机
双延迟深度确定性策略梯度法
deep reinforcement learning
close air combat
UAV
twin delayed deep deterministic policy gradient method