A parallel scheduling algorithm for reinforcement learning in large state space

A parallel scheduling algorithm for reinforcement learning in large state space

导出

摘要 The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability. The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.

作者 Quan LIU Xudong YANG Ling JING Jin LI Jiao LI

机构地区 Institute of Computer Science and Technology Department of Computer Science and Technology Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education

出处《Frontiers of Computer Science》 SCIE EI CSCD 2012年第6期631-646,共16页 中国计算机科学前沿（英文版）

基金 Acknowledgements This paper was supported by the National Natural Science Foundation of China （Grant Nos. 61272005, 61070223, 61103045, 60970015, and 61170020）, Natural Science Foundation of Jiangsu （BK2012616, BK2009116）, High School Natural Foundation of Jiangsu （09KJA520002）, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University （93K172012K04）.

关键词 divide-and-conquer strategy parallel schedule SCALABILITY large state space continuous state space divide-and-conquer strategy, parallel schedule, scalability, large state space, continuous state space

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献5

1童亮,陆际联,龚建伟.一种快速强化学习方法研究[J].北京理工大学学报,2005,25(4):328-331. 被引量：4
2宋清昆,胡子婴.基于经验知识的Q-学习算法[J].自动化技术与应用,2006,25(11):10-12. 被引量：7
3王洪彦.新的启发式Q学习算法[J].计算机工程,2009,35(22):173-175. 被引量：1
4孟伟,韩学东.并行强化学习算法及其应用研究[J].计算机工程与应用,2009,45(34):25-28. 被引量：7
5Julio H. Zaragoza,Eduardo F. Morales.Relational Reinforcement Learning with Continuous Actions by Combining Behavioural Cloning and Locally Weighted Regression[J].Journal of Intelligent Learning Systems and Applications,2010,2(2):69-79. 被引量：2

二级参考文献29

1童亮,陆际联,龚建伟.一种快速强化学习方法研究[J].北京理工大学学报,2005,25(4):328-331. 被引量：4
2Makino T, Aihara K. Multi-Agent Reinforcement Learning Algorithm to Handle Beliefs of Other Agents' Policies and Embedded Beliefs[C]//Proc. of AAMAS'06. Hakodate, Japan: [s. n.], 2006.
3Stone P, Sutton R S, Kuhlmann G. Reinforcement Learning for RoboCup Soccer Keepaway[J]. International Society for Adaptive Behavior, 2005, 13(3): 165-188.
4Marthi B. Automatic Shaping and Decomposition of Reward Functions[C]//Proceedings of the 24th International Conference on Machine Learning. Corvallis, USA: [s. n.], 2007.
5Torrey L, Shavlik J, Walker T, etal. Skill Acquisition via Transfer Learning and Advice Taking[M]. Berlin, Germany: Springer, 2006: 425-436.
6Bianchi R A C, Ribeiro C H C, Costa A H R. Heuristically Accelerated Q-learning: A New Approach to Speed Up Reinforcement Learning[J]. Lecture Notes in Artificial Intelligence, 2004, 3171: 245-254.
7Sutton R S,Barto A G.Reinforcement learning:An introduction[M]. Cambridge, MA: MIT Press, 1998.
8Watkins C J C H,Dayan P.Q-leaming[J].Machine Learning, 1992,8 (3) : 279-292.
9Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research, 1996,4:237-285.
10Barto A G,Sutton R S,Brouwer P S.Associative search network:A reinforcement learning associative memory[J].Biological Cybernetics, 1981,40:201-211.

共引文献16

1宋炯,金钊,杨维和.机器学习中加速强化学习的一种函数方法[J].云南大学学报（自然科学版）,2011,33(S2):176-181.
2毛俊杰,刘国栋.基于先验知识的改进强化学习及其在MAS中应用[J].计算机工程与应用,2008,44(24):156-158. 被引量：2
3孟伟,韩学东.并行强化学习算法及其应用研究[J].计算机工程与应用,2009,45(34):25-28. 被引量：7
4李洋.多代理强化学习在智能教学系统中的应用[J].计算机与数字工程,2010,38(5):78-80.
5胡俊,朱庆保.未知环境下基于有先验知识的滚动Q学习机器人路径规划[J].控制与决策,2010,25(9):1364-1368. 被引量：11
6耿晓龙,李长江.基于人工神经网络的并行强化学习自适应路径规划[J].科学技术与工程,2011,11(4):756-759. 被引量：7
7乔林,罗杰.MAS中基于多奖惩标准的Q学习算法研究[J].计算机科学,2012,39(B06):235-237.
8杨月全,韩飞,金露,倪春波,曹志强,张天平.基于局部加权k近邻的多机器人系统异步互增强学习[J].东南大学学报（自然科学版）,2012,42(A01):208-211. 被引量：2
9刘全,傅启明,杨旭东,荆玲,李瑾,李娇.一种基于智能调度的可扩展并行强化学习方法[J].计算机研究与发展,2013,50(4):843-851. 被引量：3
10连志刚,高叶军,焦斌.基于学习算法的离散型制造业生产能力平衡技术[J].安徽大学学报（自然科学版）,2013,37(2):19-24.

1杜芳华,冀俊忠,赵学武,吴晨生.基于特征映射的半监督文本分类算法[J].北京工业大学学报,2016,42(2):230-235. 被引量：5
2Zhaoxuan Zhu Houjun Wang Zhigang Wang.A Model of Sampling Base on State Space[J].通讯和计算机（中英文版）,2010,7(2):78-83.
3Yan-xia JIN,Kai ZHANG,James T. KWOK,Han-chang ZHOU.Fast and accurate kernel density approximation using a divide-and-conquer approach[J].Journal of Zhejiang University-Science C(Computers and Electronics),2010,11(9):677-689.
4黄宁宁,贾振红,余银峰,杨杰,庞韶宁.改进的FCM与局部信息相结合的图像分割[J].计算机应用与软件,2011,28(8):97-99. 被引量：4
5许小媛.基于EMPCA和RBF神经网络的人脸识别[J].科技传播,2011,3(19):202-203. 被引量：1
6李昌利,沈玉利.期望最大算法及其应用(2)[J].计算机工程与应用,2008,44(30):43-46. 被引量：3
7赵春晖,李晓慧,田明华.采用主成分量化和密度估计期望最大聚类的高光谱异常目标检测[J].光子学报,2013,42(10):1224-1230. 被引量：10
8谢文龙,苏剑波.A New Plan and Coordination Strategy for Robot System Based on State Space[J].Journal of Shanghai Jiaotong university(Science),2009,14(3):299-305. 被引量：2
9高敬惠,李玉海,刘国丽.基于期望最大理论的无监督图像分割[J].微计算机信息,2007(24):309-310.
10李昌利,沈玉利.期望最大算法及其应用(1)[J].计算机工程与应用,2008,44(29):61-64. 被引量：11

Frontiers of Computer Science

2012年第6期

浏览历史

内容加载中请稍等...

A parallel scheduling algorithm for reinforcement learning in large state space

参考文献5

二级参考文献29

共引文献16

相关作者

相关机构

相关主题

浏览历史