期刊文献+

基于改进SARSA算法的直升机CGF路径规划 被引量:3

Helicopter CGF path planning based on improved SARSA algorithm
下载PDF
导出
摘要 结合人工势场算法原理,引入奖赏生成机构对传统SARSA学习算法的奖赏生成机制进行改进,改进后的SARSA学习算法通过判断执行动作的有效性进一步结合环境信息实时生成动态奖赏,继承了人工势场算法良好的控制性能,能够根据连续预估的代价场信息进行优化搜索,使奖赏累积过程更为平滑。基于直升机CGF突袭雷达阵地模型进行仿真实验,分别对比算法收敛所需迭代次数和完成任务成功率,改进SARSA学习算法收敛时所需迭代次数是传统SARSA学习算法的一半,经过1000次迭代,改进SARSA学习算法完成任务成功率平均比传统SARSA学习算法提升12%。仿真实验表明:改进SARSA算法相对传统SARSA算法有着更优异的性能,算法收敛速度以及完成任务成功率都有明显提升,改进SARSA算法能够为直升机CGF规划安全路径。 Combined with the principle of artificial potential field algorithm,a reward generation mechanism was introduced to improve the reward generation mechanism of traditional SARSA learning algorithm.The improved SARSA learning algorithm generated dynamic rewards in real time by judging the effectiveness of executing actions and further combined environmental information to generate dynamic rewards in real time,thus inheriting the artificial potential field algorithm.The good control performance of the field algorithm can optimize the search based on the continuously estimated cost field information,so that the reward accumulation process is smoother.The simulation experiment was conducted based on the helicopter CGF raiding radar position model,and the two performance indicators of the number of iterations required for the algorithm convergence and the success rate of the mission were compared respectively.The number of iterations required to improve the convergence of the SARSA learning algorithm is half that of the traditional SARSA algorithm,after 1000 iterations,the success rate of the improved SARSA algorithm to complete the task is increased by 12%on average compared with the traditional SARSA algorithm.Simulation experiments show that the improved SARSA algorithm has better performance than the traditional SARSA algorithm,and the algorithm convergence speed and mission success rate have been significantly improved.The improved SARSA algorithm can plan a safe path for helicopter CGF.
作者 姚江毅 张阳 李雄伟 王艳超 YAO Jiangyi;ZHANG Yang;LI Xiongwei;WANG Yanchao(Equipment Simulation Training Center, Campus of Army Engineering University in Shijiazhuang, Shijiazhuang 050003, China)
出处 《兵器装备工程学报》 CSCD 北大核心 2022年第5期220-225,共6页 Journal of Ordnance Equipment Engineering
基金 国家自然科学基金项目(61602505)。
关键词 路径规划 计算机生成兵力 强化学习 人工势场 动态奖赏 path planning computer generated force reinforcement learning artificial potential field dynamic reward
  • 相关文献

参考文献10

二级参考文献167

共引文献348

同被引文献23

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部