期刊文献+

基于LSTM-PPO算法的无人作战飞机近距空战机动决策 被引量:2

Maneuvering Decision of UCAV in Close Air Combat Based on LSTM-PPO Algorithm
下载PDF
导出
摘要 近距空战中环境复杂、格斗态势高速变化,基于对策理论的方法因数据迭代量大而不能满足实时性要求,基于数据驱动的方法存在训练时间长、执行效率低的问题。对此,提出了一种基于深度强化学习算法的UCAV近距空战机动决策方法。首先,在UCAV三自由度模型的基础上构建飞行驱动模块,形成状态转移更新机制;然后在近端策略优化算法的基础上加入Ornstein-Uhlenbeck随机噪声以提高UCAV对未知状态空间的探索能力,结合长短时记忆网络(LSTM)增强对序列样本数据的学习能力,提升算法的训练效率和效果。最后通过设计3组近距空战仿真实验,并与PPO算法作性能对比,验证所提方法的有效性和优越性。 With the increasing military application of unmanned combat aircraft(UCAV),unmanned combat will become the main combat mode in the future air battlefield.In close-range air combat,the environment is complex and the combat situation changes rapidly.The method based on game theory cannot meet the real-time requirements due to the large amount of data iteration,and the data-driven method has the problems of long training time and low execution efficiency.To solve this problem,a UCAV maneuver decision method based on deep reinforcement learning algorithm is proposed in this paper.Firstly,the flight drive module is constructed on the basis of UCAV three-degree-of-freedom model to form the state transition updating mechanism.Then,on the basis of PPO algorithm,ornstein-uhlenbeck(OU)random noise was added to improve UCAV's ability to explore unknown state space,and LSTM was combined to enhance UCAV's ability to learn sequence sample data,so as to improve the training efficiency and effect of the algorithm.Finally,the effectiveness and superiority of the proposed method are verified by designing three groups of close-range air combat simulation experiments and comparing the performance with PPO algorithm.
作者 丁维 王渊 丁达理 谢磊 周欢 谭目来 吕丞辉 DING Wei;WANG Yuan;DING Dali;XIE Lei;ZHOU Huan;TAN Mulai;LYU Chenghui(Aviation Engineering School,Air Force Engineering University,Xi’an 710038,China)
出处 《空军工程大学学报(自然科学版)》 CSCD 北大核心 2022年第3期19-25,共7页 Journal of Air Force Engineering University(Natural Science Edition)
基金 陕西省自然科学基金(2020JQ-481)。
关键词 无人作战飞机 空战机动决策 深度强化学习 近谝策略伏化 长短时记忆网络 unmanned combat aerial vehicles air combat maneuver decision deep reinforcement learning proximal policy optimization short and long duration memory network
  • 相关文献

参考文献10

二级参考文献86

共引文献96

同被引文献14

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部