摘要
针对增强型算法中求解目标状态问题,提出了反映当前状态与目标状态的距离和 转换代价的优化模型,设计了基于优化状态转换信任度的增强型学习算法COSTRLA。算法定 义了优化状态信任度函数,设计了优化状态信任度函数的更新学习规则。 COSTRLA用于求解 迷宫问题,表明了算法在处理目标状态问题时比传统的增强型学习算法更加有效。
Classical reinforcement learning algorithms deal with maximizing dis tributed reinforcement signal. But they are not effective methods for solving go al state problem. To efficiently solve goal state problem, this paper proposes a new optimal behavior model based on the principle of shortest path by measuring the distance between current state and goal state as well as the cost of transi tion. It designs a credit of optimal state transition based reinforcement learn ing algorithm named COSTRLA with the model. COSTRLA defines a function of credit of optimal state transition (COST) to evaluate how optimal the output strategy is, and develops the learning rules of updating for COST function. The experimen ts on Maze problem show that COSTRLA has better performance than the classical r einforcement learning algorithm for solving goal state problem.
出处
《计算机工程》
CAS
CSCD
北大核心
2004年第1期88-89,94,共3页
Computer Engineering
关键词
增强型学习
动态规划
目标状态
最短路径
Reinforcement learning
Dynamic programming
Goal state
The shorte st path