期刊文献+

基于最小二乘的Q(λ)强化学习算法

Least-Squares based Q(λ) algorithm for reinforcement learning
下载PDF
导出
摘要 通过分析经典的Q(λ)学习算法所存在的经验利用率低、收敛速度慢的问题,根据当前和多步的经验知识样本建立了状态-动作对值函数的最小二乘逼近模型,推导了该逼近函数在一组基底上的权向量所满足的一组线性方程,从而提出了快速而实用的最小二乘Q(λ)算法及改进的递推算法。倒立摆实验表明,该算法可以提高经验利用率,有效加快收敛速度。 The problem of slow convergence speed and low efficiency of experience exploitation in classical Q(λ) learning is analyzed.And then the Least-Squares approximation model of the state-action pair's value function is constructed according to current and previous experience.A set of linear equations is derived,which is satisfied by the weight vector of function approximator on a set of bases.Thus the fast and practical Least-Squares Q(λ) algorithm and improved recursive algorithm are proposed.The experiment of inverted pendulum demonstrates that these algorithms can effectively improve convergenee speed and the efficiency of experience exploitation.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第34期47-50,共4页 Computer Engineering and Applications
基金 江苏省高校自然科学基础研究项目No.07KJD520092~~
关键词 强化学习 Q(λ)学习 函数逼近 最小二乘 倒立摆 reinforcement learning Q(λ ) learning function approximation Least-Squares inverted pendulum
  • 相关文献

参考文献11

  • 1Watkins J C H,Dayan P.Q-learning[J].Machine Learning, 1992,8 ( 1) : 279-292.
  • 2Sutton R S.Learning to predict by the methods of temporal differences[J].Machine Learning, 1988,3 : 9-44.
  • 3徐昕,贺汉根.神经网络增强学习的梯度算法研究[J].计算机学报,2003,26(2):227-233. 被引量:21
  • 4Barreto A d M S,Anderson C W.Restricted gradient-descent algorithm for value-function approximation in reinforcement learning[J]. Artificial Intelligence, 2008 : 454-482.
  • 5Kaelbling L P,Littman M L,Moore A W.Reinforcement learning: A survey[J].Journal of Artificial Intelligence Research, 1996,4 : 237-285.
  • 6Rezzoug N,Gorce P.A reinforcement learning based neural network architecture for obstacle avoidance in multi-fingered grasp synthesis[J].Neurocomputing, 2008,26( 1 ).
  • 7Erden M S,Leblebicioglu K.Free gait generation with reinforcement learning for a six-legged robot[J].Robotics and Autonomous Systems, 2008 : 199-212.
  • 8Peng J,Williams R J.Incremental multi-step q-learning[J].Machine Learning, 1996,22(4) : 283-290.
  • 9Sutton R S,Barto A G.Reinforcement learning:An introduction[M]. Cambridge,MA:MIT Press, 1998.
  • 10Lagoudakis M G,Parr R.Least-squares policy iteration[J].Journal of Machine Learning Research, 2003,4: 1107-1149.

二级参考文献16

  • 1Baird L C. Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Machine Learning (ICML95), Tahoe City, California, USA, 1995. 30~37
  • 2Rumelhart D E et al. Learning internal representations by error propagation. In: Rumelhart D E et al, eds. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol.1,Cambridge, MA: MIT Press,1986. 318~362
  • 3Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 1989, 2: 303~314
  • 4Baird L C, Moore A. Gradient descent for general reinforcement learning. In: Kearns M S, Solla S A, Cohn D A eds. Advances in Neural Information Processing Systems 11, Cambrige, MA: MIT Press, 1999. 968~974
  • 5Bertsekas D P, Tsitsiklis J N. Gradient convergence in gradient methods with errors. SIAM Journal on Optimization, 2000, 10(3): 627~642
  • 6Heger M. The loss from imperfect value functions in expectation-based and minimax-based tasks. Machine Learning, 1996, 22(1): 197~225
  • 7Sutton R. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky D S, Mozer M C, Hasselmo M E eds. Advances in Neural Information Processing Systems 8, Cambrige, MA: MIT Press, 1996. 1038~1044
  • 8Kaelbling L P et al. Reinforcement learning: A survey. Jour- nal of Artificial Intelligence Research, 1996, 4: 237~285
  • 9Tesauro G J. Temporal difference learning and TD-gammon. Communications of the ACM, 1995, 38(3):58~68
  • 10Crites R H, Barto A G. Elevator group control using multiple reinforcement learning agents. Machine Learning, 1998, 33(2/3):235~262

共引文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部