摘要
针对基于案例推理启发式Q学习(CB-HAQL)算法受案例库质量影响而无法收敛到较优策略的问题,提出基于有效触发机制改进的CB-HAQL算法。首先,根据迭代次数设置触发式案例库更新机制,只在达到阈值时生成或更新案例库,保证案例库质量;其次,设置动态参数调整案例对动作选取影响,使智能体根据对环境掌握程度决定启发影响大小;最后,加入经验倾向性探索动作加快学习效率。实验证明,改进后的算法提升了策略质量和训练速度,无人机完成导航任务证明了学习策略的有效性。
The quality of case base would affect the convergence effect of CB-HAQL algorithm strategy.Aiming at the fact,this paper developed an improved CB-HAQL algorithm based on effective triggering mechanism.Firstly,the algorithm set the trigger case base update mechanism according to the number of iterations.In order to ensure the quality of the case base,only when the threshold was reached,the algorithm generated or update the case base.Secondly,the dynamic parameter was set to adjust the impact of the case on action selection,so that the agent could determine the size of heuristic influence according to the degree of mastery of the environment.Finally,the algorithm added experience-oriented exploratory action to accelerate the learning efficiency.Experiments show that the algorithm improves the strategy quality and training speed,and the UAV’s navigation task proves the effectiveness of learning strategy.
作者
胡丹丹
莫宇帅
Hu Dandan;Mo Yushuai(Robotics Institute,Civil Aviation University of China,Tianjin 300300,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第7期2068-2071,共4页
Application Research of Computers
关键词
无人机
避障
自主导航
CB-HAQL
触发机制
UAV
obstacle avoidance
autonomous navigation
case based heuristically accelerated Q-learning(CB-HAQL)
trigger mechanism