Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem ...Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem of plasticity loss in neural networks.Plasticity loss is the difficulty of learning new knowledge after the network has converged.To this end,we propose a motivational curriculum learning distributed proximal policy optimization(MCLDPPO)algorithm,through which trained agents can significantly outperform the predictive game tree and mainstream reinforcement learning methods.The motivational curriculum learning is designed to help the agent gradually improve its combat ability by observing the agent's unsatisfactory performance and providing appropriate rewards as a guide.Furthermore,a complete tactical maneuver is encapsulated based on the existing air combat knowledge,and through the flexible use of these maneuvers,some tactics beyond human knowledge can be realized.In addition,we designed an interruption mechanism for the agent to increase the frequency of decisionmaking when the agent faces an emergency.When the number of threats received by the agent changes,the current action is interrupted in order to reacquire observations and make decisions again.Using the interruption mechanism can significantly improve the performance of the agent.To simulate actual air combat better,we use digital twin technology to simulate real air battles and propose a parallel battlefield mechanism that can run multiple simulation environments simultaneously,effectively improving data throughput.The experimental results demonstrate that the agent can fully utilize the situational information to make reasonable decisions and provide tactical adaptation in the air combat,verifying the effectiveness of the algorithmic framework proposed in this paper.展开更多
To makesystem-of-systems combat simulation models easy to be developed and reused, simulation model formal specification and representation are researched. According to the view of system-of-systems combat simulation,...To makesystem-of-systems combat simulation models easy to be developed and reused, simulation model formal specification and representation are researched. According to the view of system-of-systems combat simulation, and based on DEVS, the simulation model's fundamental formalisms are explored. It includes entity model, system-of-systems model and experiment model. It also presents rigorous formal specification. XML data exchange standard is combined to design the XML based language, SCSL, to support simulation model representation. The corresponding relationship between SCSL and simulation model formalism is discussed and the syntax and semantics of elements in SCSL are detailed. Based on simulation model formal specification, the abstract simulation algorithm is given and SCSL virtual machine, which is capable of automatically interpreting and executing simulation model represented by SCSL, is designed. Finally an application case is presented, which can show the validation of the theory and verification of SCSL.展开更多
针对现代空战中的无人机自主决策问题,将注意力机制(AM)与深度强化学习中的非确定性策略算法Soft Actor Critic(SAC)相结合,提出一种基于AM-SAC算法的机动决策算法。在1V1的作战背景下建立无人机3自由度运动模型和无人机近距空战模型,...针对现代空战中的无人机自主决策问题,将注意力机制(AM)与深度强化学习中的非确定性策略算法Soft Actor Critic(SAC)相结合,提出一种基于AM-SAC算法的机动决策算法。在1V1的作战背景下建立无人机3自由度运动模型和无人机近距空战模型,并利用敌我之间相对距离和相对方位角构建导弹攻击区模型。将AM引入SAC算法,构造权重网络,从而实现训练过程中奖励权重的动态调整并设计仿真实验。通过与SAC算法的对比以及在多个不同初始态势环境下的测试,验证了基于AM-SAC算法的机动决策算法具有更高的收敛速度和机动稳定性,在空战中有更好的表现,且适用于多种不同的作战场景。展开更多
文摘Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem of plasticity loss in neural networks.Plasticity loss is the difficulty of learning new knowledge after the network has converged.To this end,we propose a motivational curriculum learning distributed proximal policy optimization(MCLDPPO)algorithm,through which trained agents can significantly outperform the predictive game tree and mainstream reinforcement learning methods.The motivational curriculum learning is designed to help the agent gradually improve its combat ability by observing the agent's unsatisfactory performance and providing appropriate rewards as a guide.Furthermore,a complete tactical maneuver is encapsulated based on the existing air combat knowledge,and through the flexible use of these maneuvers,some tactics beyond human knowledge can be realized.In addition,we designed an interruption mechanism for the agent to increase the frequency of decisionmaking when the agent faces an emergency.When the number of threats received by the agent changes,the current action is interrupted in order to reacquire observations and make decisions again.Using the interruption mechanism can significantly improve the performance of the agent.To simulate actual air combat better,we use digital twin technology to simulate real air battles and propose a parallel battlefield mechanism that can run multiple simulation environments simultaneously,effectively improving data throughput.The experimental results demonstrate that the agent can fully utilize the situational information to make reasonable decisions and provide tactical adaptation in the air combat,verifying the effectiveness of the algorithmic framework proposed in this paper.
文摘To makesystem-of-systems combat simulation models easy to be developed and reused, simulation model formal specification and representation are researched. According to the view of system-of-systems combat simulation, and based on DEVS, the simulation model's fundamental formalisms are explored. It includes entity model, system-of-systems model and experiment model. It also presents rigorous formal specification. XML data exchange standard is combined to design the XML based language, SCSL, to support simulation model representation. The corresponding relationship between SCSL and simulation model formalism is discussed and the syntax and semantics of elements in SCSL are detailed. Based on simulation model formal specification, the abstract simulation algorithm is given and SCSL virtual machine, which is capable of automatically interpreting and executing simulation model represented by SCSL, is designed. Finally an application case is presented, which can show the validation of the theory and verification of SCSL.
文摘针对现代空战中的无人机自主决策问题,将注意力机制(AM)与深度强化学习中的非确定性策略算法Soft Actor Critic(SAC)相结合,提出一种基于AM-SAC算法的机动决策算法。在1V1的作战背景下建立无人机3自由度运动模型和无人机近距空战模型,并利用敌我之间相对距离和相对方位角构建导弹攻击区模型。将AM引入SAC算法,构造权重网络,从而实现训练过程中奖励权重的动态调整并设计仿真实验。通过与SAC算法的对比以及在多个不同初始态势环境下的测试,验证了基于AM-SAC算法的机动决策算法具有更高的收敛速度和机动稳定性,在空战中有更好的表现,且适用于多种不同的作战场景。