To reach a higher level of autonomy for unmanned combat aerial vehicle(UCAV) in air combat games, this paper builds an autonomous maneuver decision system. In this system,the air combat game is regarded as a Markov pr...To reach a higher level of autonomy for unmanned combat aerial vehicle(UCAV) in air combat games, this paper builds an autonomous maneuver decision system. In this system,the air combat game is regarded as a Markov process, so that the air combat situation can be effectively calculated via Bayesian inference theory. According to the situation assessment result,adaptively adjusts the weights of maneuver decision factors, which makes the objective function more reasonable and ensures the superiority situation for UCAV. As the air combat game is characterized by highly dynamic and a significant amount of uncertainty,to enhance the robustness and effectiveness of maneuver decision results, fuzzy logic is used to build the functions of four maneuver decision factors. Accuracy prediction of opponent aircraft is also essential to ensure making a good decision; therefore, a prediction model of opponent aircraft is designed based on the elementary maneuver method. Finally, the moving horizon optimization strategy is used to effectively model the whole air combat maneuver decision process. Various simulations are performed on typical scenario test and close-in dogfight, the results sufficiently demonstrate the superiority of the designed maneuver decision method.展开更多
Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the ...Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.展开更多
针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然...针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。展开更多
基金supported by the National Natural Science Foundation of China(61601505)the Aeronautical Science Foundation of China(20155196022)the Shaanxi Natural Science Foundation of China(2016JQ6050)
文摘To reach a higher level of autonomy for unmanned combat aerial vehicle(UCAV) in air combat games, this paper builds an autonomous maneuver decision system. In this system,the air combat game is regarded as a Markov process, so that the air combat situation can be effectively calculated via Bayesian inference theory. According to the situation assessment result,adaptively adjusts the weights of maneuver decision factors, which makes the objective function more reasonable and ensures the superiority situation for UCAV. As the air combat game is characterized by highly dynamic and a significant amount of uncertainty,to enhance the robustness and effectiveness of maneuver decision results, fuzzy logic is used to build the functions of four maneuver decision factors. Accuracy prediction of opponent aircraft is also essential to ensure making a good decision; therefore, a prediction model of opponent aircraft is designed based on the elementary maneuver method. Finally, the moving horizon optimization strategy is used to effectively model the whole air combat maneuver decision process. Various simulations are performed on typical scenario test and close-in dogfight, the results sufficiently demonstrate the superiority of the designed maneuver decision method.
文摘Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.
文摘针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。