期刊文献+

基于生成对抗近端策略优化的机动策略优化算法

GA-PPO Based Maneuvering Policy Optimization Algorithm
下载PDF
导出
摘要 针对传统强化学习算法在生成空战机动策略时存在收敛效率低、专家经验利用不足的问题,研究了基于生成对抗-近端策略优化的策略生成算法。算法采用判别器-策略-价值(DAC)网络框架,在近端策略优化(PPO)算法基础上,利用专家数据和环境交互数据训练判别器网络,并反馈调节策略网络,实现了约束策略向专家策略方向优化,提高了算法收敛效率和专家经验利用率。仿真环境为基于JSBSim开源平台的F-16飞机空气动力学模型。仿真结果表明,本文算法收敛效率高于PPO算法,生成的策略模型具备较好的智能性。 To address the issues that the traditional reinforcement learning algorithm has low convergence efficiency and in⁃sufficient use of expert data in air combat maneuver decisions,an algorithm based on generative adversarial technique is designed.The algorithm adopts the Discriminator-Actor-Critic(DAC)framework.Based on Proximal Policy Optimization(PPO)algorithm,the discriminator is trained with expert data and environmental interactive data,while training the policy network to achieve the the optimization of constrained policy towards the expert policy,which improves the convergence of the algorithm and the utilization efficiency of expert experience.The simulation environment is based on the F-16 aircraft aerodynamic model on the JSBSim open source platform.The simulation results show that the convergence efficiency of this algorithm is higher than that of the PPO algorithm,and the generated policy model has good intelligence.
作者 付宇鹏 邓向阳 朱子强 高阳 张立民 FU Yupeng;DENG Xiangyang;ZHU Ziqiang;GAO Yang;ZHANG Limin(Naval Aviation University,Yantai Shandong 264001,China;Tsinghua university,Beijing 1000084,China)
出处 《海军航空大学学报》 2023年第3期257-261,300,共6页 Journal of Naval Aviation University
基金 国防高层次人才基金项目(202220539、202220540) 山东省高等学校“青创团队计划”(2022KJ084)。
关键词 生成对抗模仿学习 近端策略优化 机动决策 强化学习 模仿学习 Generative Adversarial Imitation Learning(GAIL) Proximal Policy Optimization(PPO) manuevering decision reinforcement learning imitation learning
  • 相关文献

参考文献1

二级参考文献5

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部