摘要
强化学习的经验回放方法在减少状态序列间相关性的同时提高了数据的利用效率,但目前只能用于确定性的状态环境.为在随机状态环境下充分利用经验回放,且能够保持原有的状态分布,提出一种基于树的经验存储结构来存储探索过程中的状态转移概率,并根据该存储方式,提出基于期望经验回放的Q学习算法.该方法在保证算法复杂度较低的情况下,可实现对环境状态转移的无偏估计,减少Q学习算法的过估计问题.在经典的机器人随机行走问题中进行实验,结果证明,相比于基于均匀回放方法和优先回放的经验回放方法,基于期望经验回放Q学习算法的收敛速度约提高了50%.
The experience replay method in reinforcement learning algorithms reduces the correlation between state sequences by sampling randomly and increases the efficiency of data utilization.However,presently it can only be used in the deterministic environment.In order to use the experience replay efficiently in a dynamic random environment and keep the original state transition distribution unchanged,we propose a tree-based experience storage structure to store the state transition probability in the process of exploration and provide an expected experience replay based Q-learning algorithm which realizes an unbiased estimation of transition distribution.The main advantage of proposed algorithm lies in that it can keep the transition distribution unchanged without increasing the algorithm complexity.Additionally,it eliminates the overestimation of Q value in an efficient way.Experimental results in the classical random walking problem of robot verify that the proposed algorithm improves the convergence speed by about 50%.
作者
张峰
钱辉
董春茹
花强
ZHANG Feng;QIAN Hui;DONG Chunru;HUA Qiang(Hebei Key Laboratory of Machine Learning and Computational Intelligence,College of Mathematics and Information Science,Hebei University,Baoding 071002,Hebei Province,P.R.China)
出处
《深圳大学学报(理工版)》
EI
CAS
CSCD
北大核心
2020年第2期202-207,共6页
Journal of Shenzhen University(Science and Engineering)
基金
河北省自然科学面上基金资助项目(F2017201020,F2018201115)
河北省教育厅科学技术研究重点资助项目(ZD2019021)
河北省教育厅青年基金资助项目(QN2017019)~~
关键词
人工智能
机器学习
强化学习
经验回放
Q学习算法
随机环境
收敛
过估计
artificial intelligence
machine learning
reinforcement learning
memory replay
Q-learning algorithm
stochastic environment
convergence
over estimation