期刊文献+

基于价值滤波的空战机动决策优化方法 被引量:1

Value-filter based air-combat maneuvering optimization
原文传递
导出
摘要 针对传统强化学习算法面对复杂状态空间的空战机动决策优化问题时,存在的经验数据利用率低、算法不易收敛等问题,分析了价值滤波的概念和原理,提出了基于价值滤波的示例策略约束(DPC)算法,构建了基于DPC算法的空战机动决策优化方法和流程。算法利用价值滤波器提取回放经验池和示例经验池的优势数据,对智能体策略优化方向进行基于状态价值的约束。仿真基于JSBSim平台的F-16飞机空气动力学模型,仿真结果表明算法收敛效率明显提高并避免示例策略的次优问题,生成的机动决策模型具备较好的智能性。 To address the issues of low data utilization efficiency and convergence difficulty of traditional reinforcement learning algorithm in air-combat maneuvering decision optimization with large state space,the concept and principle of value filter are proposed and analyzed.A reinforcement learning algorithm named Demonstration Policy Constrain(DPC)is presented based on the value filter.A maneuvering decision optimization method based on the DPC algo⁃rithm is designed.With the value filter,the state-value based advantage data of the replay buffer and the demonstra⁃tion buffer are extracted to constrain the optimization direction of the policy.Based on the JSBSim's aerodynamic model of F-16 Aircraft,the simulation results show that the convergence efficiency of the algorithm is significantly im⁃proved and the sub-optimal problem of the demonstration policy is mitigated,and the maneuvering decision method proposed achieves good intelligence.
作者 付宇鹏 邓向阳 朱子强 张立民 FU Yupeng;DENG Xiangyang;ZHU Ziqiang;ZHANG Limin(School of Aviation Support,Naval Aeronautical University,Yantai 264001,China;Department of Automation,Tsinghua University,Beijing 100084,China)
出处 《航空学报》 EI CAS CSCD 北大核心 2023年第22期14-27,共14页 Acta Aeronautica et Astronautica Sinica
基金 国家自然科学基金(91538201) 国防高层次人才基金(202220539,202220540)。
关键词 价值滤波 策略约束 机动决策 强化学习 模仿学习 value filter policy constrain maneuvering decision reinforcement learning imitation learning
  • 相关文献

参考文献6

二级参考文献49

共引文献107

同被引文献10

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部