期刊文献+

基于隐偏向信息学习的强化学习算法 被引量:4

Reinforcement Learning Based on Hidden Biasing Information Learning
下载PDF
导出
摘要 传统的强化学习算法应用到大状态、动作空间和任务复杂的马尔可夫决策过程问题时,存在收敛速度慢,训练时间长等问题.有效地学习和利用问题中包含的偏向信息可以加快学习速度,提高学习效率.在分析了偏向机制特点的基础上引入了隐偏向信息的概念,建立了一种基于偏向信息学习的强化学习模型,并提出了一种基于特征的改进SARSA(λ)算法.针对于推箱任务的实验表明,改进的算法明显提高了学习效率. The extension of reinforcement learning to MDPs with large state,action space and high complexity has inevitably encountered the problem of the curse of dimensionality,which results in slow convergence and long training time.Learning and using effectively the biasing information hidden in the problems can accelerate the process of learning. This paper introduces the concept of hidden biasing information based on the analysis of bias. A model of reinforcement learning based on hidden biasing information learning is proposed. And an improved SARSA(λ) algorithm based on feature is also proposed. Finally,we validate our new algorithm by experiment on Box Pushing Task.The results show that the new algorithm has better performance.
出处 《南华大学学报(理工版)》 2004年第2期10-16,共7页 Journal of Nanhua University(Science & Engineering)
基金 国家自然科学基金资助项目(6020317) 国家科技基础性研究专项资金项目(2001DE20016-02-04).
关键词 强化学习 MARKOV决策过程 隐偏向信息 SARSA算法 复杂度 reinforcement learning Markov decision process bias hidden biasing information SARSA(λ) algorithm
  • 相关文献

参考文献16

  • 1Sutton R S,Barto A G. Reimforcement learning: an introduction[M] .MA:MIT Press, 1998.
  • 2Brown X T. Low power wireless communication via reinforcement learning[A]. In: Advances in Neural Information Processing Systems[C] .MIT press,2000(12):893 ~ 899.
  • 3Mataric M J. Cetting humanoids to move and imitate[J].IEEE Intelligent Systems,2000(7): 18 ~ 24.
  • 4Mill' an R, Posenato D, Dedieu E. Continuous - Action Qlearning[ J]. Machine Learning,2002(49):247 ~ 265.
  • 5Shapiro D. Value - driven agents[ D]. Ph. D. thesis, Stanford University, 2001.
  • 6Rennie J, McCallum A. Using reinforcement leaming to spider the web efficiently[A]. In: Pwroc of International Conference on Machine Learning (ICML)[C] .1999.
  • 7Sutton R S. Open theoretical questions in reinforcement leaming[A]. In:Proc of EuroCOLT'99[ C] .1999,11 ~ 17.
  • 8陈焕文,谢丽娟,谢建平.一类值函数激励学习的遗忘算法[J].计算机研究与发展,2001,38(4):487-494. 被引量:14
  • 9Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning [ J ]. Special Issue on Reinforcement Learning, Discrete Event Systems,2003,23(4): 197 ~ 223.
  • 10Hailu G,Sommer G.On amount and quality of bias in reinforcement learning[ A]. In: Proc of IEEE SMC' 99[ C].1999, 1491 ~ 1495.

二级参考文献5

共引文献13

同被引文献37

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部