期刊文献+

一种基于启发式奖赏函数的分层强化学习方法 被引量:11

A Hierarchical Reinforcement Learning Method Based on Heuristic Reward Function
下载PDF
导出
摘要 针对强化学习在应用中经常出现的"维数灾"问题,即状态空间的大小随着特征数量的增加而发生指数级的增长,以及收敛速度过慢的问题,提出了一种基于启发式奖赏函数的分层强化学习方法.该方法不仅能够大幅度减少环境状态空间,还能加快学习的收敛速度.将此算法应用到俄罗斯方块的仿真平台中,通过对实验中的参数进行设置及对算法性能进行分析,结果表明:采用启发式奖赏函数的分层强化学习方法能在一定程度上解决"维数灾"问题,并具有很好的收敛速度. Reinforcement learning is about controlling an autonomous agent in an unknown enviroment—often called the state space. The agent has no prior knowledge about the environment and can only obtain some knowledge by acting in the environment. Reinforcement learning, and Q-learning particularly, encounters a major problem. Learning the Q-function in tablular form may be infeasible because the amount of memory needed to store the table is excessive, and the Q-function converges only after each state being visited a lot of times. So "curse of dimensionality" is inevitably produced by large state spaces. A hierarchical reinforcement learning method based on heuristic reward function is proposed to solve the problem of "curse of dimensionality", which make the states space grow exponentially by the number of features and slow down the convergence speed. The method can reduce state spaces greatly and quicken the speed of the study. Actions are chosen with favorable purpose and efficiency so as to optimize the reward function and quicken the convergence speed. The Tetris game is applied in the method. Analysis of algorithms and the experiment result show that the method can partly solve the "curse of dimensionality" and quicken the convergence speed prominently.
出处 《计算机研究与发展》 EI CSCD 北大核心 2011年第12期2352-2358,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60873116 61070223 61070122) 江苏省自然科学基金项目(BK2008161 BK2009116) 江苏省高校自然科学研究基金项目(09KJA520002) 江苏省现代企业信息化应用支撑软件工程技术研究开发中心基金项目(SX200804)
关键词 分层强化学习 试错 启发式奖赏函数 俄罗斯方块 “维数灾” hierarchical reinforcement learning trial-and-error heuristic reward function Tetris curse of dimensionality
  • 相关文献

参考文献15

  • 1Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning [J]. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 41-77.
  • 2Sutton R S, Precup D, Singh S P. Between MDPs and semi- MDPs : A framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence, 1999, 112 (1) : 181-211.
  • 3Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelligence Research, 2000, 13(1): 227-303.
  • 4Parr R. Hierarchical control and learning for Markov decision processes [D]. Berkeley: University of California, 1998.
  • 5Neville M, Sriraam N. Transfer in variable-reward hierarchical reinforcement learning [J]. Machine Learning, 2008, 73(5): 289-312.
  • 6Schultink E G, Cavallo R. Economic hierarchical Qqearning [C]//Proc of the 23rd AAAI Conf on Artificial Intelligence. New York: ACM, 2008.
  • 7Mannor S, Menache I, Hoze I, et al. Dynamic abstraction in reinforcement learning via clustering [C] //Proc of the 21st Int Conf on Machine Learning. New York: ACM, 2004: 560 -567.
  • 8Stolle M, Precup D. Learning options in reinforcement learning [C]//Proc of the 5th Int Symp on Abstraction, Reformulation and Approximation. Berlin: Springer, 2002: 212-285.
  • 9苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量:9
  • 10Simsek O, Wolfe A P, Barto A G. Identifying useful subgoals in reinforcement learning by local graph partitioning [C] //Proc of the Int Conf on Machine Learning. New York: ACM, 2005:248-256.

二级参考文献38

  • 1沈晶,顾国昌,刘海波.分层强化学习研究综述[J].模式识别与人工智能,2005,18(5):574-581. 被引量:7
  • 2Sanner S, Boutilier C. Approximate linear programming for first order mdps [C] //The 21st Conf on Uncertainty in Artificial Intelligence. Amsterdam, Netherland: North Holland Publishing Company, 2005
  • 3Dabney W, Govern A M. Utile distinctions for reinforcement learning [C]//The 20th Int Joint Artificial Intelligence. Singapore: World Scientific P Company, 2007 relational Conf on ublishing
  • 4Croonenborghs T, Ramon J, Blockeel H, et al. Online learning and exploiting relational models in reinforcement learning [C] //The 20th Int Joint Conf on Artificial Intelligence. Singapore: World Scientific Publishing Company, 2007
  • 5Barto A, Mahadevan S. Recent advances in hierarchical reinforcement learning [J]. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4):41-77
  • 6Tadepalli P, Givan R, Driessens K. Relational reinforcement learning: An overview [C]//ICML-04 Workshop on Relational RL. Boston: PWS Publishing Company, 2004
  • 7Sanner S. Simultaneous learning of structure and value in relational reinforcement learning [C]//ICML'05 Workshop on Rich Representations for Reinforcement Learning. San Francisco: Morgan Kaufmann, 2005
  • 8Landwehr N, Kersting K, de raedt L. nFOIL: Integrating naive Bayes and FOIL [J]. Journal of Machine Learning Research, 2007, 8(5): 481-507
  • 9Driessens K, Ramon J, Gartner T. Graph kernels and Gaussican processes for relational reinforcement learning [M]. Amsterdam: Kluwer Academic Publishers, 2006
  • 10Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey [J]. Journal of Artificial Intelligence Research, 1996, 4(2): 237-285

共引文献25

同被引文献92

  • 1陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 2陈茂,陈小平.基于采样的POMDP近似算法[J].计算机仿真,2006,23(5):64-67. 被引量:2
  • 3沈晶,顾国昌,刘海波.未知动态环境中基于分层强化学习的移动机器人路径规划[J].机器人,2006,28(5):544-547. 被引量:15
  • 4Szepesvari Cs, Algorithms for Reinforcement I.earning [M]. San Rafael, California: Morgan Claypool, 2010.
  • 5Busoniu L, Babuska R, Sehutter B D, et al. Reinforcement Learning and Dynamic Programming Using Functicm Approximators[M]. New York, CRff Press, 2010.
  • 6Ross S M. Introduction to Stochastic Dynamic Programming[M]. New York: Academic Press, 1983.
  • 7Cao X R. Stochastic Learning and Optimization: A Sensitivity Based Approach [M]. Berlin: Springer, 2007.
  • 8Sutton R S, Barto A C-. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998.
  • 9Szita I, Takcies B. Lorincz A. Epsilon-MDPs: Learning in varying environments [J]. Journal of Machine l.earning Research, 2003. 3(1): 145-174.
  • 10SantosM, Martint A, LopezV, et al. Dyna H: Aheuristic planning reinforcement learning algorithm applied to role playing game strategy decision systems [J]. Knowledge- Based Systems, 2012, 32(1): 28-36.

引证文献11

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部