摘要
面对未来有/无人机协同作战场景,实时准确的空战决策是制胜的关键。复杂的空中环境、瞬变的态势数据以及多重繁琐的作战任务,使有/无人机协同作战将替代单机作战成为未来空战的发展趋势,但多智能体建模和训练过程却面临奖励分配困难、网络难收敛的问题。针对5v5有/无人机协同的空战场景,抽象出有人机和无人机智能体的特征模型,提出基于近端策略优化算法的空战智能决策算法,通过设置态势评估奖励引导空战过程中有/无人机智能体的决策行为向有利态势发展,实现在与环境的实时交互中,输出空战决策序列。通过仿真实验对所提空战决策算法进行验证,结果表明:本文提出的算法在经过训练学习后,能够适应复杂的战场态势,在连续动作空间中得到稳定合理的决策策略。
Facing the future combat scenario with manned and unmanned aerial vehicle cooperation,real-time and accurate air combat decision-making is the basis of winning.The complex air environment,transient situation data,and multiple cumbersome combat tasks make coordinated combat with unmanned aerial vehicles a trend in future air combat,replacing single machine combat.However,multi-agent modeling and training processes face difficulties in reward allocation and network convergence.Air combat scenarios for 5v5 manned and unmanned aerial vehicle cooperation,the characteristic model of single agent is abstracted in this paper,and an algorithm based on proximal policy optimization is proposed to obtain the air combat decision sequence by using reward and punishment incentive in the real-time interaction with the environment.The simulation results show that the algorithm proposed in this paper can adapt to the complex battlefield situation and get a stable and reasonable decision-making strategy in continuous action space after training and learning.
作者
张博超
温晓玲
刘璐
张雅茜
王宏光
ZHANG Bochao;WEN Xiaoling;LIU Lu;ZHANG Yaqian;WANG Hongguang(Shenyang Aircraft Design and Research Institute,Aviation Industry Corporation of China,Ltd.,Shenyang 110035,China)
出处
《航空工程进展》
CSCD
2023年第2期145-151,共7页
Advances in Aeronautical Science and Engineering
关键词
空战决策
智能决策
强化学习
近端策略优化
有/无人机协同
air combat decision
intelligent decision
reinforcement learning
proximal policy optimization
manned and unmanned aerial vehicle cooperation