期刊文献+

基于元强化学习的无人机自主避障与目标追踪 被引量:4

Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Meta-Reinforcement Learning
下载PDF
导出
摘要 针对传统深度强化学习在求解无人机自主避障与目标追踪任务时所存在的训练效率低、环境适应性差的问题,在深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法中融入与模型无关的元学习(Model-Agnostic Meta-Learning,MAML),设计一种内外部元参数更新规则,提出了元深度确定性策略梯度(Meta-Deep Deterministic Policy Gradient,Meta-DDPG)算法,以提升模型的收敛速度和泛化能力.此外,在模型预训练部分构造基本元任务集以提升实际工程中的预训练效率.最后,在多种测试环境下对所提算法进行了仿真验证,结果表明基本元任务集的引入可使模型预训练效果更优,Meta-DDPG算法相比DDPG算法在收敛特性和环境适应性方面更有优势,并且元学习方法和基本元任务集对确定性策略强化学习具有通用性. There are some problems with traditional deep reinforcement learning in solving autonomous obstacle avoidance and target tracking tasks for unmanned aerial vehicles(UAV),such as low training efficiency and weak adaptability to variable environments. To overcome these problems,this paper designs an internal and external metaparameter update rule by incorporating Model-Agnostic Meta-Learning(MAML)into Deep Deterministic Policy Gradient(DDPG)algorithm and proposes a Meta-Deep Deterministic Policy Gradient(Meta-DDPG)algorithm inovder to improve the convergence speed and generalization ability of the model. Furthermore,the basic meta-task sets are constructed in the model’s pre-training stage to improve the efficiency of pre-training in practical engineering. Finally,the proposed algorithm is simulated and verified in Various testing environments. The results show that the introduction of the basic meta-task sets can make the model’s pre-training more efficient,Meta-DDPG algorithm has better convergence characteristics and environmental adaptability when compared with the DDPG algorithm. Furthermore,the meta-learning and the basic meta-task sets are universal to deterministic policy reinforcement learning.
作者 江未来 吴俊 王耀南 JIANG Weilai;WU Jun;WANG Yaonan(College of Electrical and Information Engineering,Hunan Unviersity,Changsha 410082,China;National Engineering Research Center of Robot Visual Perception&Control Technology,Hunan University,Changsha 410082,China)
出处 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2022年第6期101-109,共9页 Journal of Hunan University:Natural Sciences
基金 国家自然科学基金资助项目(61903133,61733004) 国家重点研发计划重点专项项目(2021YFC1910400) 江苏省重点研发计划项目(BE2020082-1)。
关键词 元强化学习 无人机 自主避障 目标追踪 路径规划 meta-reinforcement learning Unmanned Aerial Vehicle(UAV) autonomous obstacle avoidance target tracking path planning
  • 相关文献

参考文献3

二级参考文献14

共引文献53

同被引文献73

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部