期刊文献+

基于协同最小二乘支持向量机的Q学习 被引量:20

Q-learning System Based on Cooperative Least Squares Support Vector Machine
下载PDF
导出
摘要 针对强化学习系统收敛速度慢的问题,提出一种适用于连续状态、离散动作空间的基于协同最小二乘支持向量机的Q学习.该Q学习系统由一个最小二乘支持向量回归机(Least squares support vector regression machine,LS-SVRM)和一个最小二乘支持向量分类机(Least squares support vector classification machine,LS-SVCM)构成.LS-SVRM用于逼近状态-动作对到值函数的映射,LS-SVCM则用于逼近连续状态空间到离散动作空间的映射,并为LS-SVRM提供实时、动态的知识或建议(建议动作值)以促进值函数的学习.小车爬山最短时间控制仿真结果表明,与基于单一LS-SVRM的Q学习系统相比,该方法加快了系统的学习收敛速度,具有较好的学习性能. In order to solve the problem of slow convergence speed in reinforcement learning systems, a Q learning system based on a cooperative least squares support vector machine for continuous state space and discrete action space is proposed. The proposed Q learning system is composed of a least squares support vector regression machine (LS-SVRM) and a least squares support vector classification machine (LS-SVCM). The LS-SVRM is used to approximate a mapping from a stateaction pair to a value function, and the LS-SVCM is used to approximate a mapping from a continuous state space to a discrete action space. In acldition, the LS-SVCM supplies the LS-SVRM with dynamic and real-time knowledge or advice (suggested action) to accelerate its learning process. Simulation studies involving a mountain car control illustrate that compared with a Q learning system based on a single LS-SVRM, the proposed Q learning system has a faster convergence speed and a better learning performance.
出处 《自动化学报》 EI CSCD 北大核心 2009年第2期214-219,共6页 Acta Automatica Sinica
基金 国家自然科学基金(60804022) 高等学校博士学科点专项科研基金(20070290537,200802901506) 国家博士后科学基金(20070411064) 江苏省自然科学基金(BK2008126) 江苏省博士后科学基金(0601033B)资助~~
关键词 强化学习 Q学习 协同 最小二乘支持向量机 映射 Reinforcement learning, Q learning, cooperative, least squares support vector machine (LS-SVM), mapping
  • 相关文献

参考文献19

  • 1Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
  • 2Conn K, Peters R A. Reinforcement learning with a supervisor for a mobile robot in a real-world environment. In: Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation. Piscataway, USA: IEEE, 2007. 73-78.
  • 3Syafiie S, Tadeo F, Martinez E. Model-free learning control of neutralization processes using reinforcement learning. Engineering Applications of Artificial Intelligence, 2007, 20 (6): 767- 782.
  • 4孙晟,王世进,奚立峰.基于强化学习的模式驱动调度系统研究[J].计算机集成制造系统,2007,13(9):1795-1800. 被引量:3
  • 5Wang X S, Cheng Y H, Yi J Q. A fuzzy actor-critic reinforcement learning network. Information Sciences, 2007, 177(18): 3764 - 3781.
  • 6高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量:13
  • 7Goto R, Matsuo H. State generalization method with support vector machines in reinforcement learning. Systems and Computers in Japan, 2006, 37(9): 77-86.
  • 8Mangasaxian O L, Shavlik J W, Wild E W. Knowledge-based kernel approximation. The Journal of Machine Learning Research, 2004, 5:1127-1141.
  • 9Maclin R, Shavlik J, Torrey L, Walker T, Wildz E. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In: Proceedings of the 20th National Conference on Artificial Intelligence. Pittsburgh, USA: AAAI Press, 2005. 819-824.
  • 10Maclin R, Shavlik J, Walker T, Torrey L. Knowledge-based support-vector regression for reinforcement learning. In: Proceedings of the IJCAI'05 Workshop on Reasoning, Representation, and Learning in Computer Games. Edinburgh, Scotland: IJCAI, 2005. 1-6.

二级参考文献35

  • 1Y Sakai, K Kurosawa. Develop of elevator supervisory group control system with artificial intelligence[ J] .Hitachi Review, 1984,33:25 - 30.
  • 2M L Siikonen. Elevator traffic simulafion[J]. Simulation, 1993, 61 : 257 - 267.
  • 3H Ujihara,S Tsuji. The revolutionary AI-2100 elevator-group control system and the new intelligent option series [ J ]. Mitsubishi Electric Advance, 1988,45: 5 - 8.
  • 4H Ujihara, M Amano. The latest elevator group-control system [J]. Mitsubishi Electric Advance, 1994,67:10 - 12.
  • 5Cdtes R H, Barto A G. Elevator group control using multiple reinforcement learning agents[ J ]. Machine Learning, 1998, 33 (2) :235 - 262.
  • 6Kaelbling L P, Littlnan M L, Moore A W. Reinforcement learning: a survey [ J ]. Journal of Artificial Intelligence Research,1996,4:237 - 285.
  • 7R S Sutlon and A G Barto. Reinforcement Learning[M] .Cambridge,MA: MIT Press, 1998.
  • 8Rich S Sutton.Generalization in reinforcement learning: successful exan~es using sparse coarse coding[A] .D Touretzky ,M Mozer,M Hasselmo,Advances in Neural Infonmation Processing Systems 8[C].New York:MIT Press, 1996.1038- 1044.
  • 9Albus J S.A new approach to manipulator control: The cere-bellar model articulation controller(cmac) [ J ]. Jounlal of Dynamic Systems, Measurement, and Control, 1975,97 (3) : 220- 227.
  • 10Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4(2):237~285

共引文献277

同被引文献278

引证文献20

二级引证文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部