摘要
针对强化学习中奖励信号稀疏导致价值函数收敛慢的问题,提出结合人工势场(Artificial Potential Field, APF)法和深度Q学习网络(Deep Q-learning Network, DQN)的空战机动决策方法。描述了空战机动态势,建立了空战机动人工势场模型,设计了一阶APF奖励函数,构造了基于DQN的空战决策模型,提出了基于APF-DQN的空战机动决策方法,最后进行了仿真试验。仿真结果表明,所提方法能够解决奖励信号稀疏的问题,使我方战机能够很好地进行轨迹跟踪,占据有利态势。
In order to solve the problem of slow convergence of value function caused by sparse reward signals in reinforcement learning, an air combat maneuvering decision method combining artificial potential field and deep Q-learning network was proposed. The air combat maneuver situation was described, and the artificial potential field model of air combat maneuver was established. The first-order APF reward function was designed, and the air combat maneuver decision model based on DQN was constructed. Air combat maneuver decision method based on APF-DQN was proposed, and the simulation test was carried out at last. Simulation results show that this method can solve the problem of sparse reward signal, and make our fighter aircraft can track the trajectory well and occupy a favorable situation.
作者
张晓杰
周中良
ZHANG Xiaojie;ZHOU Zhongliang(Air Force Engineering University,Xi’an 710038,China)
出处
《飞行力学》
CSCD
北大核心
2021年第5期88-94,共7页
Flight Dynamics
关键词
空战机动
人工势场
奖励函数
强化学习
air combat maneuvering
APF
reward function
reinforcement learning