摘要
传统的强化学习算法应用到大状态、动作空间和任务复杂的马尔可夫决策过程问题时,存在收敛速度慢,训练时间长等问题.有效地学习和利用问题中包含的偏向信息可以加快学习速度,提高学习效率.在分析了偏向机制特点的基础上引入了隐偏向信息的概念,建立了一种基于偏向信息学习的强化学习模型,并提出了一种基于特征的改进SARSA(λ)算法.针对于推箱任务的实验表明,改进的算法明显提高了学习效率.
The extension of reinforcement learning to MDPs with large state,action space and high complexity has inevitably encountered the problem of the curse of dimensionality,which results in slow convergence and long training time.Learning and using effectively the biasing information hidden in the problems can accelerate the process of learning. This paper introduces the concept of hidden biasing information based on the analysis of bias. A model of reinforcement learning based on hidden biasing information learning is proposed. And an improved SARSA(λ) algorithm based on feature is also proposed. Finally,we validate our new algorithm by experiment on Box Pushing Task.The results show that the new algorithm has better performance.
出处
《南华大学学报(理工版)》
2004年第2期10-16,共7页
Journal of Nanhua University(Science & Engineering)
基金
国家自然科学基金资助项目(6020317)
国家科技基础性研究专项资金项目(2001DE20016-02-04).