期刊文献+

基于核方法的强化学习算法 被引量:1

A Kernel Based Reinforcement Learning Algorithm
下载PDF
导出
摘要 传统的强化学习算法通常假设状态空间和行动空间是离散的,而实际上很多问题的状态空间是连续的,这就大大地限制了强化学习在实际中的应用。为克服以上不足,本文提出了一种基于核方法的强化学习算法,能直接处理具有连续状态空间的问题。最后,通过具有连续状态空间和离散行动空间的mountain car问题来验证算法。实验表明,这种算法在处理具有连续状态空间的问题时,和传统的先把状态空间离散化的方法相比,能以较少的训练数据收敛到更好的策略。 Traditional Reinforcement Learning algorithms usually assume discrete states and actions, however, many tasks inherently have continuous state spaces, which limits the practical use of reinforcement learning largely. In this paper, we develop a kernel based reinforcement learning algorithm, which solve the problems with continuous state spaces directly. Finally, we illustrated the algorithm by solving the classical mountain car task. Experiments show that the algorithm converges to good policies with relati...
作者 何源 张文生
出处 《微计算机信息》 北大核心 2008年第4期243-245,共3页 Control & Automation
基金 国家基础研究计划973项目名称:机器学习与数据描述编号:2004CB318103
关键词 强化学习 核方法 马尔科夫决策过程 Q-LEARNING MOUNTAIN CAR reinforcement learning kernel markov decision process Q-learning mountain-car
  • 相关文献

参考文献7

  • 1[1]Sutton,R.S.and Barto,A.G,Reinforcement learning:An Introduction.Cambridge,MA MIT Press,1998.
  • 2叶德谦,金大兵,杨樱.基于强化学习的股票预测系统的研究与设计[J].微计算机信息,2006,22(02X):149-151. 被引量:4
  • 3[3]S.ingh and D.Bertsekas.Reinforcement learning for dynamic channel allocation in cellular telephone systems.In M.C.Mozer,M..Jordan,and T.etsche,editors,Advances in Neural Information Processing Systems,volume 9,page 974.The MIT Press,1997.
  • 4[4]G.Tesauro,Neurogammon wins computer Olympiad.Neural Computation,1(3):321-323,1989.
  • 5[5]Ormnoneit,.and Sen.S,Kernel-based Reinforcement Learning.Machine Learning,49,161-178,2002.
  • 6[6]J.Fan and I.Gijbels.Local Polynomial Modeling and Its Applications.Champman and Hall,1996
  • 7[7]Nicholas K.ong and Peter Stone,Kernel-Based Models for Reinforcement Learning,Kernel Machines for Reinforcement LearningWorkshop,Pittsburgh,PA,2006.

二级参考文献2

共引文献3

同被引文献31

  • 1陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 2高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量:13
  • 3Su S, Lee Z, Wang Y. Robust and fast learning for fuzzy cerebellar model articulation controllers. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics,2006,36(1):203-208.
  • 4Ernst D, Geurts P, Wehenkel L. Tree-Based Batch Mode Reinforcement Learning. Journal of Machine Learning Research,2006,6(1): 503-556.
  • 5Christopher Kenneth Monson. Reinforcement learning in the joint space: value iteration in worlds with continuous states and actions. Master of Science, Brigham Young University, 2003.
  • 6Baird L. Residual algorithm: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Confer- ence on Machine Learning, Morgan Kaufmann, 1995:30-37.
  • 7Remi Munos, Andrew Moore Variable Resolution Discretization in Optimal Control. Machine Learning, 2002, 49: 291-323.
  • 8Shimon Whiteson, Peter Stone. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 2006, 7: 877-917.
  • 9Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Athena Scientific: Belmont, MA, 1996.
  • 10Jason P, Lagoudakis Michail G. Learning continuous action control policies. 2009 IEEE Symposium on Adaptive Dynamic Program- ming and Reinforcement Learning, 2009:169-176.

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部