摘要
针对强化学习系统收敛速度慢的问题,提出一种适用于连续状态、离散动作空间的基于协同最小二乘支持向量机的Q学习.该Q学习系统由一个最小二乘支持向量回归机(Least squares support vector regression machine,LS-SVRM)和一个最小二乘支持向量分类机(Least squares support vector classification machine,LS-SVCM)构成.LS-SVRM用于逼近状态-动作对到值函数的映射,LS-SVCM则用于逼近连续状态空间到离散动作空间的映射,并为LS-SVRM提供实时、动态的知识或建议(建议动作值)以促进值函数的学习.小车爬山最短时间控制仿真结果表明,与基于单一LS-SVRM的Q学习系统相比,该方法加快了系统的学习收敛速度,具有较好的学习性能.
In order to solve the problem of slow convergence speed in reinforcement learning systems, a Q learning system based on a cooperative least squares support vector machine for continuous state space and discrete action space is proposed. The proposed Q learning system is composed of a least squares support vector regression machine (LS-SVRM) and a least squares support vector classification machine (LS-SVCM). The LS-SVRM is used to approximate a mapping from a stateaction pair to a value function, and the LS-SVCM is used to approximate a mapping from a continuous state space to a discrete action space. In acldition, the LS-SVCM supplies the LS-SVRM with dynamic and real-time knowledge or advice (suggested action) to accelerate its learning process. Simulation studies involving a mountain car control illustrate that compared with a Q learning system based on a single LS-SVRM, the proposed Q learning system has a faster convergence speed and a better learning performance.
出处
《自动化学报》
EI
CSCD
北大核心
2009年第2期214-219,共6页
Acta Automatica Sinica
基金
国家自然科学基金(60804022)
高等学校博士学科点专项科研基金(20070290537,200802901506)
国家博士后科学基金(20070411064)
江苏省自然科学基金(BK2008126)
江苏省博士后科学基金(0601033B)资助~~
关键词
强化学习
Q学习
协同
最小二乘支持向量机
映射
Reinforcement learning, Q learning, cooperative, least squares support vector machine (LS-SVM), mapping