期刊文献+

基于强化学习的禁飞区绕飞智能制导技术

Intelligent guidance for no⁃fly zone avoidance based on reinforcement learning
原文传递
导出
摘要 人工智能(AI)的快速发展为飞行器制导技术的研究提供新的技术途径。本文针对高速飞行器面临不确定禁飞区的绕飞问题,提出“预测校正制导—基于监督学习预训练倾侧角制导模型—基于强化学习进一步升级倾侧角制导模型”逐级递进的禁飞区绕飞智能制导研究框架:一是基于传统预测校正制导生成大量禁飞区绕飞样本轨迹,并基于监督学习方法对倾侧角制导模型进行预训练;二是进一步采用强化学习中近端策略优化算法(PPO)升级倾侧角制导模型,通过飞行器与带有不确定禁飞区环境的大量交互探索,并设置有效的奖励引导,充分挖掘高升阻比飞行器强大的横向机动能力,摆脱传统预测校正制导方法对倾侧角解空间的约束,期望产生更优的绕飞策略。通过与传统预测校正制导和基于监督学习的智能制导的对比分析,验证了基于强化学习的禁飞区绕飞智能制导技术能够充分发挥飞行器的宽域飞行优势,满足未来飞行器智能决策系统对不确定绕飞场景的适应性需求。 The rapid development of Artificial Intelligence(AI)provides a new technical approach for the research of aircraft guidance.Aiming at the problem of reentry aircraft for avoiding uncertain no-fly zone,we propose the research frame of“predictor-corrector guidance-pre-training of bank angle guidance model based on supervised learning-further training of bank angle guidance model based on reinforcement learning”.On the one hand,lots of flying trajectory for avoiding no-fly zone are produced by predictor-corrector guidance.The bank angle guidance model is pre-trained with supervised learning algorithm.On the other hand,the bank angle guidance model is further trained by the use of Proximal Policy Optimization(PPO)algorithm.A large number of exploration interactions are taken between aircraft and environment with uncertain no-fly-zone.At the same time,the powerful lateral maneuverability of high lift-drag ra⁃tio reentry aircraft is exploited with effective reward.Such method will get rid of restriction of bank angle solution space produced by predictor-corrector guidance,which is expected to produce better strategy for avoiding no-fly zone.By comparing with traditional predictor-corrector guidance and intelligent guidance based on supervised learning,it is veri⁃fied that the no-fly zone intelligent guidance technology based on reinforcement learning can fully exploit the wide area flight advantages of aircraft,so as to meet the adaptability requirements of future intelligent decision system under un⁃certain scenarios.
作者 惠俊鹏 汪韧 郭继峰 HUI Junpeng;WANG Ren;GUO Jifeng(School of Astronautics,Harbin Institute of Technology,Harbin 150006,China;China Academy of Aerospace Science and Innovation,Beijing 100176,China)
出处 《航空学报》 EI CAS CSCD 北大核心 2023年第11期235-247,共13页 Acta Aeronautica et Astronautica Sinica
基金 国家级项目。
关键词 智能制导 禁飞区绕飞 强化学习 PPO算法 监督学习 intelligent guidance no-fly zone avoidance reinforcement learning PPO algorithm supervised learning
  • 相关文献

参考文献18

二级参考文献132

共引文献182

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部