摘要
深度强化学习在训练过程中会探索大量环境样本,造成算法收敛时间过长,而重用或传输来自先前任务(源任务)学习的知识,对算法在新任务(目标任务)的学习具有提高算法收敛速度的潜力。为了提高算法学习效率,提出一种双Q网络学习的迁移强化学习算法,其基于actor-critic框架迁移源任务最优值函数的知识,使目标任务中值函数网络对策略作出更准确的评价,引导策略快速向最优策略方向更新。将该算法用于Open AI Gym以及在三维空间机械臂到达目标物位置的实验中,相比于常规深度强化学习算法取得了更好的效果,实验证明提出的双Q网络学习的迁移强化学习算法具有较快的收敛速度,并且在训练过程中算法探索更加稳定。
Deep reinforcement learning explores a large number of environmental samples during the training process,which will cause the algorithm to take too long to converge.Reuse or transfer the knowledge of the previous task(source task),which has the potential to improve the convergence speed for the learning of the algorithm in the new task(target task).In order to improve the efficiency of algorithm learning,this paper proposed transfer reinforcement learning algorithm with double Q-lear-ning.The algorithm based on the actor-critic framework utilized the knowledge of the optimal value function of the source task,so that the value function network of the target task made a more accurate evaluation of the strategy,and guided the stra-tegy to quickly update in the direction of the optimal strategy.In Open AI Gym and the experiments where manipulator reaches the target position in the three-dimensional space,this algorithm achieves better results than conventional deep reinforcement learning algorithms.Experiments show that transfer reinforcement learning algorithm with double Q-learning has faster convergence speed,and the algorithm exploration is more stable during the training process.
作者
曾睿
周建
刘满禄
张俊俊
陈卓
Zeng Rui;Zhou Jian;Liu Manlu;Zhang Junjun;Chen Zhuo(School of Manufacturing Science&Engineering,Southwest University of Science&Technology,Mianyang Sichuan 621000,China;Robot Technology Used for Special Environment Key Laboratory of Sichuan Province,Southwest University of Science&Technology,Mianyang Sichuan 621000,China;School of Information Engineering,Southwest University of Science&Technology,Mianyang Sichuan 621000,China)
出处
《计算机应用研究》
CSCD
北大核心
2021年第6期1699-1703,共5页
Application Research of Computers
基金
国家“十三五”核能开发项目(20161295)
国家科技重大专项资助项目(2019ZX06002022)。