期刊文献+

基于启发函数改进的SARSA(λ)算法 被引量:2

SARSA(λ) Algorithm Based on Heuristic Function
下载PDF
导出
摘要 强化学习是一种重要的机器学习方法,在机器人路径规划,智能控制等许多决策问题中取得了成功的应用,已经成为机器学习研究的一个重要分支。针对强化学习存在着的收敛慢,学习知识慢,探索与利用平衡等问题,论文对SARSA(λ)算法提出了一种改进,改进的方法借助经验知识从环境特征中提出一个用于策略择优和优化回报函数的启发函数,以此来加速算法的收敛速度。通过仿真对比,论文提出改进算法具有比SARSA(λ)更快的奖赏反馈,表明了该算法在知识学习方面的有效性。 Reinforcement learning is an important method of machine learning research.The success in robot path planning,intelligent control and many other successful application in decision making problems make it become an important component of machine learning.But it is also has the problem of slow convergence,slow learning,exploration and utilization of balance.In this paper,an improved algorithm is proposed based on SARSA(λ),which can extract features form the environment and get the heuristic function for strategy and reward function to accelerate the convergence speed.Through simulation comparison,this improved algorithm has faster reward feedback than SARSA(λ),it is showed that the effectiveness of the algorithm in the learning of knowledge.
出处 《计算机与数字工程》 2016年第5期825-828,共4页 Computer & Digital Engineering
关键词 强化学习 SARSA(λ) 启发函数 评估学习 reinforcement learning SARSA(λ) heuristic function assessment learning
  • 相关文献

参考文献8

  • 1MitchellTM.机器学习.曾华军,译.北京:机械工业出版社,2008:23-27.
  • 2L. P. Kael bling, M. L. Litt man, A. W. Moore. Reinforcement Learning: A Survey[R]. Arxiv preprint cs/9605103,1996 : 237-285.
  • 3L. Tang,]3. An, D. Cheng. An agent reinforcement learning model based on neural networks[C//Bio-In- spired Computational Intelligence and Applications, 2007:117-127.
  • 4Jinsong, Leng, Lakhmi Jain, Colin Fyfe. Convergence Analysis on Approximate Reinforcement Learning [C//Z. Zhang and Siekmann (Eds): KSEM 2007, LNAI 4798, pp. 85-91.
  • 5Rummery, G A, Niranjan, M. On-line Q_learning u- sing connectionist systemED]. London: Cambridge U- niversity, 1994.
  • 6SUTTONRS, BARTO AG. Reinforcement learning: an introduction[M]. Cambridge: MIT, 1998:150-185.
  • 7Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces[J]. Machine Learning,1996, (22) : 123-158.
  • 8Bianchi R A C, Ribeiro C H C, Costa A H R. Accel- erating autonomous learning by using heuristic selec- tion of actions[J]. Journal of Heuristics, 2008,14(2) .- 135-168.

同被引文献35

引证文献2

二级引证文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部