期刊文献+

最小状态变元平均奖赏的强化学习方法 被引量:15

Reinforcement learning algorithm based on minimum state method and average reward
下载PDF
导出
摘要 针对采用折扣奖赏作为评价目标的Q学习无法体现对后续动作的影响问题,提出将平均奖赏和Q学习相结合的AR-Q-Learning算法,并进行收敛性证明。针对学习参数个数随着状态变量维数呈几何级增长的"维数灾"问题,提出最小状态变元的思想。将最小变元思想和平均奖赏用于积木世界的强化学习中,试验结果表明,该方法更具有后效性,加快算法的收敛速度,同时在一定程度上解决积木世界中的"维数灾"问题。 In allusion to the problem that Q-Learning,which was used discount reward as the evaluation criterion,could not show the affect of the action to the next situation,AR-Q-Learning was put forward based on the average reward and Q-Learning.In allusion to the curse of dimensionality,which meant that the computational requirement grew exponen-tially with the number of the state variable.Minimum state method was put forward.AR-Q-Learning and minimum state method were used in reinforcement learning for Blocks World,and the result of the experiment shows that the method has the characteristic of aftereffect and converges more faster than Q-Learning,and at the same time,solve the curse of di-mensionality in a certain extent in Blocks World.
出处 《通信学报》 EI CSCD 北大核心 2011年第1期66-71,共6页 Journal on Communications
基金 国家自然科学基金资助项目(60873116 61070223 61070122) 江苏省自然科学基金资助项目(BK2008161 BK2009116) 江苏省高校自然科学研究基金资助项目(09KJA520002) 江苏省现代企业信息化应用支撑软件工程技术研究开发中心基金资助项目(SX200804)~~
关键词 强化学习 平均奖赏 俄罗斯方块 最小状态 reinforcement learning average reward tetris minimum state
  • 相关文献

参考文献3

二级参考文献63

  • 1Bernstein D, Zilberstein S, Immerman N. The Complexity of Decentralized Control of Markov Decision Processes. In :Proc of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford, USA, 2000, 32-37.
  • 2Singh S P, Jaakola T, Jordan M I. Reinforcement Learning with Soft State Aggregation. In:Tesauro G, Touretzky D S, Leen T K, eds. Advances in Neural Information Processing Systems 7.Cambridge, USA:MIT Press, 1995, 361-368.
  • 3Moriarty D, Sehultz A, Grefenstette J. Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research, 1999, 11:241-276.
  • 4Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming.Belmont, USA: Athena Scientific, 1996.
  • 5Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems:Theory and Applications, 2003, 13(4), 41-77.
  • 6Sutton R S, Precup D, Singh S P. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1-2): 181-211.
  • 7Parr R. Hierarchical Control and Learning for Markov Decision Processes. Ph. D Dissertation. University of California, Berkeley, USA, 1998.
  • 8Dietterich T G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13 : 227- 303.
  • 9Minsky M L. Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem. Ph. D Dissertation. Princeton University, Princeton, USA, 1954.
  • 10Bellman R E, Dreyfus S E. Applied Dynamic Programming. Princeton, USA: Princeton University Press, 1962.

共引文献302

同被引文献111

  • 1魏英姿 ,赵明扬 .强化学习算法中启发式回报函数的设计及其收敛性分析[J].计算机科学,2005,32(3):190-193. 被引量:13
  • 2陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 3Watkins C J C H. Learning from Delayed Rewards. Ph. D Disserta-tion. Cambridge, UK: Cambridge University, 1989.
  • 4Watkins C J C H, Dayan P. Q-Leaming. Machine Learning, 1992,8(3/4): 279-292.
  • 5Szepesvfiri C S. The Asymptotic Convergence-Rate of Q-Leaming //Proc of the 10th Neural Information Processing Systems. Cambridge,USA: MIT Press, 1997: 1064-1070.
  • 6Sutton R S, Barlo G A. Reinforcement Learning. Cambridge, USA;MIT Press, 1998.
  • 7Even-Dar E, Mansour Y. Learning Rates for Q-Leaming. Journal ofMachine Learning Research, 2003, 5: 1-25.
  • 8Ernst D, Geurts P, Wehenkel L. Tree-Based Batch Mode Reinforce-ment Learning. Journal of Machine Learning Research, 2005,6(4): 503-556.
  • 9Strehl A L, Li L H, Wiewiora E, et al. PAC Model-Free Reinforce-ment Learning // Proc of the 23 rd International Conference onMachine Learning. New York, USA, 2006 : 881-888.
  • 10Maei H R, Szepesvri C S, Bhatnagar S, et al. Toward 0.f~PolicyLearning Control with Function Approximation // Proc of the 27 thInternational Conference on Machine Learning. Haif, Israel, 2010 :719-726.

引证文献15

二级引证文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部