期刊文献+

平均报酬指标多步递推最小二乘即时差分学习

Average Reward Multi-Step Temporal-Difference Learning Using Recursive Least-Squares Methods
下载PDF
导出
摘要 对非周期不可约Markov链上的线性函数近似平均报酬指标即时差分学习方法进行了研究.近似器由权值进行增量更新的固定特征函数线性加权组合构成,在对已有的算法进行比较分析的基础上,利用线性参数估计理论的有关成果,提出了基于值函数线性近似表示的平均报酬指标多步递推最小二乘即时差分强化学习算法,并给出了其一致收敛性证明. Average reward temporal-difference learning of an irreducible aperiodic Markov chain based on linear function approximations is investigated. Approximations are comprised of linear combinations of fixed basis function whose weight are incrementally updated. On the basis of analyzing and investigating the exist algorithms,and using the linear parameter estimation theory, a new class of average reward multi-step temporal-difference learning algorithms based on linear function approximations and recursive least squares methods is proposed. A proof of uniform converge is presented.
出处 《内蒙古大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第5期560-565,共6页 Journal of Inner Mongolia University:Natural Science Edition
基金 广西自然科学基金(桂科自0481016) 教育部重点(204031) 内蒙古大学博士基金(203043) 内蒙古大学“513”人才计划(205144)资助项目
关键词 即时差分学习 平均报酬 函数近似 最小二乘 递推 temporal-difference learning average reward function approximation least squares recursive
  • 相关文献

参考文献11

二级参考文献23

  • 1马勇,杨煜普,许晓鸣,石坚,卓斌,吴远朋.一类再励学习控制器设计及其在倒车模型中的应用[J].上海交通大学学报,2000,34(12):1661-1663. 被引量:1
  • 2Sutton R S,Barto A G.Reinforcement Learning:An Introduction[M].Cambridge:MIT Press,1998.
  • 3Moore A W,Atkeson C G.The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces[J].Machine Learning,1995,21(3):199-233.
  • 4Uther W T,Veloso M M.Tree Based Discretizationfor Continuous State Space Reinforcement Learning[A].AAAI'98[C].Madison,1998:769-774
  • 5Smith A J.Applications of the Self-organising Map to Reinforcement Learning[J].Neural Networks,2002,15(8-9):1107-1124.
  • 6Lee I S K,Lau H Y K.Adaptive State Space Partitioning for Reinforcement Learning[J].Engineering Applications of Artificial Intelligence,2004,17(6):577-588.
  • 7Haykin S.Neural Networks:A Comprehensive Foundation[M].Beijing:Tsinghua University Press,2001.
  • 8Singh S P,Sutton R S.Reinforcement Learning with Replacing Eligibility Traces[J].Machine Learning,1996,22(2):123-158.
  • 9[1]Sutton R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988,3:9~44
  • 10[2]Dayan P D. The convergence of TD(λ) for general λ.Machine Learning, 1992,8:341~362

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部