期刊文献+

一种基于优化状态转换信任度的增强型学习算法

Reinforcement Learning Algorithm Based on Credit of Optimal State Transition
下载PDF
导出
摘要 针对增强型算法中求解目标状态问题,提出了反映当前状态与目标状态的距离和 转换代价的优化模型,设计了基于优化状态转换信任度的增强型学习算法COSTRLA。算法定 义了优化状态信任度函数,设计了优化状态信任度函数的更新学习规则。 COSTRLA用于求解 迷宫问题,表明了算法在处理目标状态问题时比传统的增强型学习算法更加有效。 Classical reinforcement learning algorithms deal with maximizing dis tributed reinforcement signal. But they are not effective methods for solving go al state problem. To efficiently solve goal state problem, this paper proposes a new optimal behavior model based on the principle of shortest path by measuring the distance between current state and goal state as well as the cost of transi tion. It designs a credit of optimal state transition based reinforcement learn ing algorithm named COSTRLA with the model. COSTRLA defines a function of credit of optimal state transition (COST) to evaluate how optimal the output strategy is, and develops the learning rules of updating for COST function. The experimen ts on Maze problem show that COSTRLA has better performance than the classical r einforcement learning algorithm for solving goal state problem.
出处 《计算机工程》 CAS CSCD 北大核心 2004年第1期88-89,94,共3页 Computer Engineering
关键词 增强型学习 动态规划 目标状态 最短路径 Reinforcement learning Dynamic programming Goal state The shorte st path
  • 相关文献

参考文献5

  • 1[1]Kaebling L P, Littman M L, Moore A W. Reinforcement Learning : A Survey. Journal of Artificial Intelligence Research, 1996, 4: 237-285
  • 2[2]Haykin S. Neural Networks A Comprehensive Foundation. Prentice- Hall , 2001: 603-634
  • 3[3]Sutton R S. Learning to Predict by the Method of Temporal Differen- ces. Machine Learning , 1988, 3(1): 9-44
  • 4[4]Watkins C J C H, Dayan P.Q-Learning . Machine Learning ,1992,8(3) : 279-292
  • 5[5]Moore A W, Atkeson C G. Prioritized Sweeping : Reinforcement Lear- rning with Less Data and Less Time. Machine Learning , 1993, 13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部