摘要
通过分析经典的Q(λ)学习算法所存在的经验利用率低、收敛速度慢的问题,根据当前和多步的经验知识样本建立了状态-动作对值函数的最小二乘逼近模型,推导了该逼近函数在一组基底上的权向量所满足的一组线性方程,从而提出了快速而实用的最小二乘Q(λ)算法及改进的递推算法。倒立摆实验表明,该算法可以提高经验利用率,有效加快收敛速度。
The problem of slow convergence speed and low efficiency of experience exploitation in classical Q(λ) learning is analyzed.And then the Least-Squares approximation model of the state-action pair's value function is constructed according to current and previous experience.A set of linear equations is derived,which is satisfied by the weight vector of function approximator on a set of bases.Thus the fast and practical Least-Squares Q(λ) algorithm and improved recursive algorithm are proposed.The experiment of inverted pendulum demonstrates that these algorithms can effectively improve convergenee speed and the efficiency of experience exploitation.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第34期47-50,共4页
Computer Engineering and Applications
基金
江苏省高校自然科学基础研究项目No.07KJD520092~~
关键词
强化学习
Q(λ)学习
函数逼近
最小二乘
倒立摆
reinforcement learning
Q(λ ) learning
function approximation
Least-Squares
inverted pendulum