期刊文献+

随机状态下基于期望经验回放的Q学习算法 被引量:2

An expected experience replay based Q-learning algorithm with random state transition
下载PDF
导出
摘要 强化学习的经验回放方法在减少状态序列间相关性的同时提高了数据的利用效率,但目前只能用于确定性的状态环境.为在随机状态环境下充分利用经验回放,且能够保持原有的状态分布,提出一种基于树的经验存储结构来存储探索过程中的状态转移概率,并根据该存储方式,提出基于期望经验回放的Q学习算法.该方法在保证算法复杂度较低的情况下,可实现对环境状态转移的无偏估计,减少Q学习算法的过估计问题.在经典的机器人随机行走问题中进行实验,结果证明,相比于基于均匀回放方法和优先回放的经验回放方法,基于期望经验回放Q学习算法的收敛速度约提高了50%. The experience replay method in reinforcement learning algorithms reduces the correlation between state sequences by sampling randomly and increases the efficiency of data utilization.However,presently it can only be used in the deterministic environment.In order to use the experience replay efficiently in a dynamic random environment and keep the original state transition distribution unchanged,we propose a tree-based experience storage structure to store the state transition probability in the process of exploration and provide an expected experience replay based Q-learning algorithm which realizes an unbiased estimation of transition distribution.The main advantage of proposed algorithm lies in that it can keep the transition distribution unchanged without increasing the algorithm complexity.Additionally,it eliminates the overestimation of Q value in an efficient way.Experimental results in the classical random walking problem of robot verify that the proposed algorithm improves the convergence speed by about 50%.
作者 张峰 钱辉 董春茹 花强 ZHANG Feng;QIAN Hui;DONG Chunru;HUA Qiang(Hebei Key Laboratory of Machine Learning and Computational Intelligence,College of Mathematics and Information Science,Hebei University,Baoding 071002,Hebei Province,P.R.China)
出处 《深圳大学学报(理工版)》 EI CAS CSCD 北大核心 2020年第2期202-207,共6页 Journal of Shenzhen University(Science and Engineering)
基金 河北省自然科学面上基金资助项目(F2017201020,F2018201115) 河北省教育厅科学技术研究重点资助项目(ZD2019021) 河北省教育厅青年基金资助项目(QN2017019)~~
关键词 人工智能 机器学习 强化学习 经验回放 Q学习算法 随机环境 收敛 过估计 artificial intelligence machine learning reinforcement learning memory replay Q-learning algorithm stochastic environment convergence over estimation
  • 相关文献

同被引文献19

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部