摘要
DDPG算法是一种端到端的深度强化学习算法,主要用于解决仿真任务。DDPG能够在具有高维度动作空间的任务中取得接近人类的水平,然而当任务的复杂性提高时,DDPG存在收敛时间长和最终效果差的问题。为了提高在复杂任务环境中算法的收敛速度和最终效果,提出一种基于间歇控制框架的层级深度确定性策略梯度算法(HDDPG)用于完成仿真任务。首先在间歇控制原则下对复杂的任务进行策略上的分解,分解后的子任务间具有层级的架构和较为单一的优化目标,然后在最小转换原则下使用DDPG算法针对多个层级任务寻找最优解。使用DDPG和HD-DPG算法分别在轨迹追踪仿真任务中进行对比实验,实验结果证明在复杂连续运动控制任务上HDDPG相对DDPG算法具有更快的收敛速度和更好的实验结果。
Deep deterministic policy gradient (DDPG) is an end-to-end deep reinforcement learning algorithm. It is mainly used to solve virtual environment tasks. DDPG can achieve close-to-human level in tasks with high dimensional action space, but DDPG has the problem of long convergence time and poor performance for problems with significant complexity. In order to improve the convergence speed and the final effect of the algorithm in a complex task environment, proposes a hierarchical deep deterministic policy gradient (HDDPG) based on an intermittent control framework. Firstly, by applying the intermittent control principle the complex task can be decomposed into multiple subtasks, each of which can be optimized as a single task. Then the DDPG algorithm is used to find the optimal solution for multiple hierarchical tasks under the minimum transition principle. The DDPG and HDDPG algorithms are compared to verify the advantage of HDDPG, on the experiments of trajectory tracking task. The experimental results show that HDDPG is superior to the direct DDPG algorithm in problems of complex continuous motion control tasks, and has faster convergence speed.
作者
李广源
史海波
孙杳如
LI Guang-yuan;SHI Hai-bo;SUN Yao-ru(Department of Computer Science College, College of Electronics and Information Engineering, Tongji University, Shanghai 201804)
基金
国家自然科学基金(No.91748122)
关键词
DDPG
HDDPG
运动控制
确定性策略梯度
强化学习
间歇控制
Deep Deterministic Policy Gradient (DDPG)
Hierarchy Deep Deterministic Policy Gradient (HDDPG)
Movement Control
Deterministic Policy Gradient
Reinforcement Learning
Intermittent Control