摘要
强化学习是一种重要的机器学习方法,在机器人路径规划,智能控制等许多决策问题中取得了成功的应用,已经成为机器学习研究的一个重要分支。针对强化学习存在着的收敛慢,学习知识慢,探索与利用平衡等问题,论文对SARSA(λ)算法提出了一种改进,改进的方法借助经验知识从环境特征中提出一个用于策略择优和优化回报函数的启发函数,以此来加速算法的收敛速度。通过仿真对比,论文提出改进算法具有比SARSA(λ)更快的奖赏反馈,表明了该算法在知识学习方面的有效性。
Reinforcement learning is an important method of machine learning research.The success in robot path planning,intelligent control and many other successful application in decision making problems make it become an important component of machine learning.But it is also has the problem of slow convergence,slow learning,exploration and utilization of balance.In this paper,an improved algorithm is proposed based on SARSA(λ),which can extract features form the environment and get the heuristic function for strategy and reward function to accelerate the convergence speed.Through simulation comparison,this improved algorithm has faster reward feedback than SARSA(λ),it is showed that the effectiveness of the algorithm in the learning of knowledge.
出处
《计算机与数字工程》
2016年第5期825-828,共4页
Computer & Digital Engineering