Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Crit...Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Critic(SAC)algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process.At the same time,the complexity of the proposed algorithm is calculated,and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function.This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm(PSP-SAC)to improve the generalisation performance of UAV control decisions.Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.展开更多
The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial.Unmanned aerial vehicles inability to manoeuvre autonomous...The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial.Unmanned aerial vehicles inability to manoeuvre autonomously during air combat that features highly dynamic and uncertain manoeuvres of the enemy;however,limits their combat capabilities,which proves to be very challenging.To meet the challenge,this article proposes an autonomous manoeuvre decision model using an expert actor-based soft actor critic algorithm that reconstructs empirical replay buffer with expert experience.Specifically,the algorithm uses a small amount of expert experience to increase the diversity of the samples,which can largely improve the exploration and utilisation efficiency of deep reinforcement learning.And to simulate the complex battlefield environment,a one-toone air combat model is established and the concept of missile's attack region is introduced.The model enables the one-to-one air combat to be simulated under different initial battlefield situations.Simulation results show that the expert actor-based soft actor critic algorithm can find the most favourable policy for unmanned aerial vehicles to defeat the opponent faster,and converge more quickly,compared with the soft actor critic algorithm.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:62003267Fundamental Research Funds for the Central Universities,Grant/Award Number:G2022KY0602+1 种基金Technology on Electromagnetic Space Operations and Applications Laboratory,Grant/Award Number:2022ZX0090Key Core Technology Research Plan of Xi'an,Grant/Award Number:21RGZN0016。
文摘Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Critic(SAC)algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process.At the same time,the complexity of the proposed algorithm is calculated,and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function.This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm(PSP-SAC)to improve the generalisation performance of UAV control decisions.Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.
基金acknowledge the National Nature Science Foundation of China(Grant No.62003267)Fundamental Research Funds for the Central Universities(Grant No.G2022KY0602)+1 种基金Technology on Electromagnetic Space Operations and Applications Laboratory(Grant No.2022ZX0090)key core technology research plan of Xi'an(Grant No.21RGZN0016)to provide fund for conducting experiments.
文摘The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial.Unmanned aerial vehicles inability to manoeuvre autonomously during air combat that features highly dynamic and uncertain manoeuvres of the enemy;however,limits their combat capabilities,which proves to be very challenging.To meet the challenge,this article proposes an autonomous manoeuvre decision model using an expert actor-based soft actor critic algorithm that reconstructs empirical replay buffer with expert experience.Specifically,the algorithm uses a small amount of expert experience to increase the diversity of the samples,which can largely improve the exploration and utilisation efficiency of deep reinforcement learning.And to simulate the complex battlefield environment,a one-toone air combat model is established and the concept of missile's attack region is introduced.The model enables the one-to-one air combat to be simulated under different initial battlefield situations.Simulation results show that the expert actor-based soft actor critic algorithm can find the most favourable policy for unmanned aerial vehicles to defeat the opponent faster,and converge more quickly,compared with the soft actor critic algorithm.