The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial.Unmanned aerial vehicles inability to manoeuvre autonomous...The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial.Unmanned aerial vehicles inability to manoeuvre autonomously during air combat that features highly dynamic and uncertain manoeuvres of the enemy;however,limits their combat capabilities,which proves to be very challenging.To meet the challenge,this article proposes an autonomous manoeuvre decision model using an expert actor-based soft actor critic algorithm that reconstructs empirical replay buffer with expert experience.Specifically,the algorithm uses a small amount of expert experience to increase the diversity of the samples,which can largely improve the exploration and utilisation efficiency of deep reinforcement learning.And to simulate the complex battlefield environment,a one-toone air combat model is established and the concept of missile's attack region is introduced.The model enables the one-to-one air combat to be simulated under different initial battlefield situations.Simulation results show that the expert actor-based soft actor critic algorithm can find the most favourable policy for unmanned aerial vehicles to defeat the opponent faster,and converge more quickly,compared with the soft actor critic algorithm.展开更多
Aiming at intelligent decision-making of unmanned aerial vehicle(UAV)based on situation information in air combat,a novelmaneuvering decision method based on deep reinforcement learning is proposed in this paper.The a...Aiming at intelligent decision-making of unmanned aerial vehicle(UAV)based on situation information in air combat,a novelmaneuvering decision method based on deep reinforcement learning is proposed in this paper.The autonomous maneuvering model ofUAV is established byMarkovDecision Process.The Twin DelayedDeep Deterministic Policy Gradient(TD3)algorithm and the Deep Deterministic Policy Gradient(DDPG)algorithm in deep reinforcement learning are used to train the model,and the experimental results of the two algorithms are analyzed and compared.The simulation experiment results show that compared with the DDPG algorithm,the TD3 algorithm has stronger decision-making performance and faster convergence speed and is more suitable for solving combat problems.The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position,speed,and relative azimuth,adjust their actions to approach,and successfully strike the enemy,providing a new method for UAVs to make intelligent maneuvering decisions during air combat.展开更多
As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and ...As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.展开更多
基金acknowledge the National Nature Science Foundation of China(Grant No.62003267)Fundamental Research Funds for the Central Universities(Grant No.G2022KY0602)+1 种基金Technology on Electromagnetic Space Operations and Applications Laboratory(Grant No.2022ZX0090)key core technology research plan of Xi'an(Grant No.21RGZN0016)to provide fund for conducting experiments.
文摘The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial.Unmanned aerial vehicles inability to manoeuvre autonomously during air combat that features highly dynamic and uncertain manoeuvres of the enemy;however,limits their combat capabilities,which proves to be very challenging.To meet the challenge,this article proposes an autonomous manoeuvre decision model using an expert actor-based soft actor critic algorithm that reconstructs empirical replay buffer with expert experience.Specifically,the algorithm uses a small amount of expert experience to increase the diversity of the samples,which can largely improve the exploration and utilisation efficiency of deep reinforcement learning.And to simulate the complex battlefield environment,a one-toone air combat model is established and the concept of missile's attack region is introduced.The model enables the one-to-one air combat to be simulated under different initial battlefield situations.Simulation results show that the expert actor-based soft actor critic algorithm can find the most favourable policy for unmanned aerial vehicles to defeat the opponent faster,and converge more quickly,compared with the soft actor critic algorithm.
基金acknowledge National Natural Science Foundation of China(Grant No.61573285,No.62003267)Open Fund of Key Laboratory of Data Link Technology of China Electronics Technology Group Corporation(Grant No.CLDL-20182101)Natural Science Foundation of Shaanxi Province(Grant No.2020JQ220)to provide fund for conducting experiments.
文摘Aiming at intelligent decision-making of unmanned aerial vehicle(UAV)based on situation information in air combat,a novelmaneuvering decision method based on deep reinforcement learning is proposed in this paper.The autonomous maneuvering model ofUAV is established byMarkovDecision Process.The Twin DelayedDeep Deterministic Policy Gradient(TD3)algorithm and the Deep Deterministic Policy Gradient(DDPG)algorithm in deep reinforcement learning are used to train the model,and the experimental results of the two algorithms are analyzed and compared.The simulation experiment results show that compared with the DDPG algorithm,the TD3 algorithm has stronger decision-making performance and faster convergence speed and is more suitable for solving combat problems.The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position,speed,and relative azimuth,adjust their actions to approach,and successfully strike the enemy,providing a new method for UAVs to make intelligent maneuvering decisions during air combat.
基金co-supported by the National Natural Science Foundation of China(Nos.62003267 and 61573285)the Natural Science Basic Research Plan in Shaanxi Province of China(No.2020JQ-220)+1 种基金the Open Project of Science and Technology on Electronic Information Control Laboratory,China(No.JS20201100339)the Open Project of Science and Technology on Electromagnetic Space Operations and Applications Laboratory,China(No.JS20210586512).
文摘As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.