摘要
为了解决深度确定性策略梯度(DDPG)算法在规划无人机(UAV)安全避障路径时收敛速度慢、奖励函数设置困难等问题,基于逆向强化学习提出了一种融合专家演示轨迹的UAV路径规划算法。首先,基于模拟器软件采集专家操纵UAV避障的演示轨迹数据集;其次,采用混合采样机制,在自探索数据中融合高质量专家演示轨迹数据更新网络参数,以降低算法探索成本;最后,根据最大熵逆向强化学习算法求解专家经验中隐含的最优奖励函数,解决了复杂任务中奖励函数设置困难的问题。对比实验结果表明,改进后的算法能有效提升算法训练效率且避障性能更优。
In the planning of UAV safe collision avoidance path,Deep Deterministic Policy Gradient(DDPG)algorithm suffers from slow convergence rate and reward function setting difficulties.To solve the problems based on reverse reinforcement learning,a UAV path planning algorithm that integrates expert demonstration trajectories is proposed.Firstly based on the simulator software the demostration trajectory dataset of the expert manipulating the UAV to avoid obstacles is collected.Secondly the hybrid sampling mechanism is used to update the network parameters by integrating high-quality expert demonstration trajectory data in the self-exploration data to reduce the cost of algorithm exploration.Finally according to the maximum entropy reverse reinforcement learning algorithm the optimal reward function implied in the experience of experts is calculated which solves the problem that the reward function is difficult to design in complex tasks.Comparative experimental results show that the improved algorithm can effectively improve the efficiency of algorithm training and the obstacle avoidance performance is better.
作者
杨秀霞
王晨蕾
张毅
于浩
姜子劼
YANG Xiuxia;WANG Chenlei;ZHANG Yi;YU Hao;JIANG Zijie(Naval Aviation University Yantai,264000,China)
出处
《电光与控制》
CSCD
北大核心
2023年第8期1-7,共7页
Electronics Optics & Control
基金
山东省自然科学基金(ZR2020MF090)。