摘要
Q-Learning算法是一种基于价值函数的强化学习方法。传统的Q-Learning算法迭代效率低且容易陷入局部收敛,针对该劣势改进了算法,引入A*算法和动态搜索因子ε。将改进后的动态A*-Q-Learning算法应用于三维复杂环境下无人机的航迹规划,分析无人机航迹规划结果的回报函数、探索步数和运行效率。结果表明,改进后的算法可使无人机在复杂环境下具有很强的自适应性;同时,动态搜索因子ε能有效地避免智能体在搜寻过程中陷入局部最优的状况,在复杂地形中能寻找到更优的路径。
The Q-Learning algorithm is a reinforcement learning method based on value functions.The traditional Q-Learning algorithm lacks efficiency in iteration and is easy to fall into local convergence.To solve the disadvantage,the algorithm is improved:introducing A* algorithm and dynamic search factorε.The improved dynamic A*-Q-Learning algorithm is applied to the route planning of UAV in 3D complex environment,and the return function,exploration steps and operation efficiency of UAV route planning results are analyzed.The results demonstrate that the improved algorithm can enable UAV to have strong adaptability in the face of complex environment;meanwhile,dynamic search factorsεcan effectively avoid the agent falling into the local optimal condition in the search process,and find a better path in complex terrain.
作者
程传斌
倪艾辰
房翔宇
张亮
CHENG Chuanbin;NI Aichen;FANG Xiangyu;ZHANG Liang(School of Science,Wuhan University of Technology,Wuhan 430070,China;School of Economics,Wuhan University of Technology,Wuhan 430070,China)
出处
《现代信息科技》
2021年第9期1-5,9,共6页
Modern Information Technology
基金
国家自然科学基金(61573012)。