摘要
可回收运载火箭的着陆制导需要严格保证着陆位置、速度精度,并尽量减小燃料消耗。基于最优控制的着陆制导方法需要依赖火箭精确模型,不具有对模型偏差的泛化能力。针对此问题,本文基于强化学习方法,通过不基于模型的交互采样,训练了神经网络形式的火箭着陆制导策略。首先,建立火箭着陆制导问题的马尔可夫决策过程模型,根据终端约束和燃料消耗指标设计了分阶段奖励函数;然后,在此基础上设计了多层感知机制导策略网络,并使用不基于模型的邻近策略优化算法,通过与火箭着陆制导马尔可夫决策过程的交互采样,实现对制导策略网络的迭代优化;最后,在可回收运载火箭着陆段仿真场景下对制导策略进行验证。仿真结果表明,本文提出的强化学习着陆制导策略能够保证火箭的着陆精度、燃料消耗与最优解相近,且能够泛化至火箭模型参数存在偏差的工况。
Landing guidance for reusable launch vehicle should ensure the accuracy of landing position and velocity as well as minimized fuel consumption.Landing guidance methods based on optimal control is based on accurate rocket dy⁃namic model,which corrupts the scalability of guidance methods.To address this problem,a neural network landing guidance policy is developed based on model-free iterative reinforcement learning approach.First,a Markov decision process model of the rocket landing guidance problem is established,and a staged reward function is designed according to the terminal constraints and fuel consumption index;Further,a multilayer perceptron guidance policy network is de⁃veloped,and a model-free proximal policy optimization algorithm is adopted to achieve iterative optimization of the guidance policy network through interaction with the rocket landing guidance Markov decision process;Finally,the guidance policy is validated under simulations of a reusable launch vehicle landing scenario.The results show that the proposed reinforcement learning landing guidance policy can achieve high landing accuracy,near optimal fuel consump⁃tion,and adaptivity to parameter uncertainty of the rocket model.
作者
何林坤
张冉
龚庆海
HE Linkun;ZHANG Ran;GONG Qinghai(School of Astronautics,Beihang University,Beijing 100191,China;Beijing Aerospace Automatic Control Institute,Beijing 100070,China)
出处
《空天防御》
2021年第3期33-40,共8页
Air & Space Defense
关键词
着陆制导
可回收运载火箭
最优控制
强化学习
垂直回收
landing guidance
reusable launch vehicle
optimal control
reinforcement learning
vertical recycling