摘要
针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法。针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性。
To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game,the pursuit game policy is studied based on reinforcement learning,and the adaptive-augmented random search algorithm is proposed.Firstly,to solve the sparse reward problem of sequential decision making,an exploration method based on the spatial perturbation of parameters of the policy is designed,thus accelerating its convergence speed.Secondly,to avoid the possibility of falling into local optimum prematurely,a novelty degree function is designed to guide the policy update,enhancing the efficiency of data utilization.Finally,the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm,the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.
作者
焦杰
苟永杰
吴文博
泮斌峰
JIAO Jie;GOU Yongjie;WU Wenbo;PAN Binfeng(School of Astronautics,Northwestern Polytechnical University,Xi′an 710072,China;National Key Laboratory of Aerospace Flight Dynamics,Xi′an 710072,China;Shanghai Aerospace Systems Engineering Institute,Shanghai 201108,China)
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2024年第1期117-128,共12页
Journal of Northwestern Polytechnical University
关键词
非合作目标
追逃博弈
微分对策
强化学习
稀疏奖励
non-cooperative target
pursuit game
differential game theory
reinforcement learning
sparse reward