期刊文献+

基于逆向强化学习的无人机路径规划 被引量:2

UAV Path Planning Based on Reverse Reinforcement Learning
下载PDF
导出
摘要 为了解决深度确定性策略梯度(DDPG)算法在规划无人机(UAV)安全避障路径时收敛速度慢、奖励函数设置困难等问题,基于逆向强化学习提出了一种融合专家演示轨迹的UAV路径规划算法。首先,基于模拟器软件采集专家操纵UAV避障的演示轨迹数据集;其次,采用混合采样机制,在自探索数据中融合高质量专家演示轨迹数据更新网络参数,以降低算法探索成本;最后,根据最大熵逆向强化学习算法求解专家经验中隐含的最优奖励函数,解决了复杂任务中奖励函数设置困难的问题。对比实验结果表明,改进后的算法能有效提升算法训练效率且避障性能更优。 In the planning of UAV safe collision avoidance path,Deep Deterministic Policy Gradient(DDPG)algorithm suffers from slow convergence rate and reward function setting difficulties.To solve the problems based on reverse reinforcement learning,a UAV path planning algorithm that integrates expert demonstration trajectories is proposed.Firstly based on the simulator software the demostration trajectory dataset of the expert manipulating the UAV to avoid obstacles is collected.Secondly the hybrid sampling mechanism is used to update the network parameters by integrating high-quality expert demonstration trajectory data in the self-exploration data to reduce the cost of algorithm exploration.Finally according to the maximum entropy reverse reinforcement learning algorithm the optimal reward function implied in the experience of experts is calculated which solves the problem that the reward function is difficult to design in complex tasks.Comparative experimental results show that the improved algorithm can effectively improve the efficiency of algorithm training and the obstacle avoidance performance is better.
作者 杨秀霞 王晨蕾 张毅 于浩 姜子劼 YANG Xiuxia;WANG Chenlei;ZHANG Yi;YU Hao;JIANG Zijie(Naval Aviation University Yantai,264000,China)
机构地区 海军航空大学
出处 《电光与控制》 CSCD 北大核心 2023年第8期1-7,共7页 Electronics Optics & Control
基金 山东省自然科学基金(ZR2020MF090)。
关键词 无人机 路径规划 逆向强化学习 深度确定性策略梯度 UAV path planning reverse reinforcement learning DDPG
  • 相关文献

参考文献2

二级参考文献11

共引文献9

同被引文献17

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部