摘要
针对分配策略最优算法在大规模场景中求解速度不足的问题,基于马尔可夫决策过程,将深度强化学习与其相结合,将大规模防空任务分配问题进行智能化求解。根据大规模防空作战特点,利用马尔可夫决策过程对智能体进行建模,构建数字战场仿真环境;设计防空任务分配智能体,通过近端策略优化算法,在数字战场仿真环境中进行训练。以大规模防空对抗任务为例,验证了该方法的可行性和优越性。
Aiming at the insufficient solving speed of assignment strategy optimization algorithm in largescale scenarios,deep reinforcement learning is combined with Markov decision process to carry out the intelligent large-scale air defense task assignment.According to the characteristics of large-scale air defense operations,Markov decision process is used to model the agent and a digital battlefield simulation environment is built.Air defense task assignment agent is designed and trained in digital battlefield simulation environment through proximal policy optimization algorithm.The feasibility and advantage of the method are verified by taking a large-scale ground-to-air countermeasure mission as an example.
作者
刘家义
王刚
付强
郭相科
王思远
Liu Jiayi;Wang Gang;Fu Qiang;Guo Xiangke;Wang Siyuan(Air and Missile Defense College,Air Force Engineering University,Xi'an 710051,China;Graduate College,Air Force Engineering University,Xi'an 710051,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2023年第8期1705-1716,共12页
Journal of System Simulation
基金
国家自然科学基金(62106283)。
关键词
分配策略优化算法
任务分配
马尔可夫决策过程
深度强化学习
智能体
assignment strategy optimization algorithm
task assignment
Markov decision process
deep reinforcement learning
agent