期刊文献+

A parallel scheduling algorithm for reinforcement learning in large state space

A parallel scheduling algorithm for reinforcement learning in large state space
原文传递
导出
摘要 The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability. The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第6期631-646,共16页 中国计算机科学前沿(英文版)
基金 Acknowledgements This paper was supported by the National Natural Science Foundation of China (Grant Nos. 61272005, 61070223, 61103045, 60970015, and 61170020), Natural Science Foundation of Jiangsu (BK2012616, BK2009116), High School Natural Foundation of Jiangsu (09KJA520002), and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172012K04).
关键词 divide-and-conquer strategy parallel schedule SCALABILITY large state space continuous state space divide-and-conquer strategy, parallel schedule, scalability, large state space, continuous state space
  • 相关文献

参考文献5

二级参考文献29

  • 1童亮,陆际联,龚建伟.一种快速强化学习方法研究[J].北京理工大学学报,2005,25(4):328-331. 被引量:4
  • 2Makino T, Aihara K. Multi-Agent Reinforcement Learning Algorithm to Handle Beliefs of Other Agents' Policies and Embedded Beliefs[C]//Proc. of AAMAS'06. Hakodate, Japan: [s. n.], 2006.
  • 3Stone P, Sutton R S, Kuhlmann G. Reinforcement Learning for RoboCup Soccer Keepaway[J]. International Society for Adaptive Behavior, 2005, 13(3): 165-188.
  • 4Marthi B. Automatic Shaping and Decomposition of Reward Functions[C]//Proceedings of the 24th International Conference on Machine Learning. Corvallis, USA: [s. n.], 2007.
  • 5Torrey L, Shavlik J, Walker T, etal. Skill Acquisition via Transfer Learning and Advice Taking[M]. Berlin, Germany: Springer, 2006: 425-436.
  • 6Bianchi R A C, Ribeiro C H C, Costa A H R. Heuristically Accelerated Q-learning: A New Approach to Speed Up Reinforcement Learning[J]. Lecture Notes in Artificial Intelligence, 2004, 3171: 245-254.
  • 7Sutton R S,Barto A G.Reinforcement learning:An introduction[M]. Cambridge, MA: MIT Press, 1998.
  • 8Watkins C J C H,Dayan P.Q-leaming[J].Machine Learning, 1992,8 (3) : 279-292.
  • 9Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research, 1996,4:237-285.
  • 10Barto A G,Sutton R S,Brouwer P S.Associative search network:A reinforcement learning associative memory[J].Biological Cybernetics, 1981,40:201-211.

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部