Reinforcement learning with partitioning function system

Reinforcement learning with partitioning function system

下载PDF

导出

摘要 The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforcement learning system with partitioning function (RLWPF) is established, in which state-space is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large state-space. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinary elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF. The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforcement learning system with partitioning function (RLWPF) is established, in which state-space is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large state-space. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinary elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF.

作者李伟叶庆泰朱昌明

机构地区 College of Machine and Dynamics Engineering

出处《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2004年第4期377-381,共5页 哈尔滨工业大学学报（英文版）

基金 SponsoredbytheNationalNaturalScienceFoundationofChina(GrantNo .6 9975 0 1 3) .

关键词多AGENT系统分割函数强化学习半马尔可夫决策分割模数 multi-agent systems partitioning reinforcement learning elevator

分类号 TP182 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1Satinder Singh,Tommi Jaakkola,Michael L. Littman,Csaba Szepesvári.Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms[J].Machine Learning.2000(3)
2SHIZW.AgentandItsApplication[]..2000
3ZHUCM,HONGZY,ZHANGHQ.ElevatorandEscala tor[M ][]..1995
4BAOG,CASSANDRASCG,DJAFERIS ,etal.ElevatorDispatchersforDownPeakTraffic[].ECEDepartmentTechnicalReport.1994
5Sutton R S,Barto A G.Reinforcement learning:Anintroduction[]..1998
6CRITES R H,BARTO A G.Improving elevator performance using reinforcement learning[].Advances in Neural Information Processing Systems.1996

1杨晓庆.计算机系统与计算机网络中的动态优化:模型、求解与应用[J].计算机光盘软件与应用,2014,17(9):108-108. 被引量：3
2赵飞,刘宁,秦敏.计算机系统与计算机网络中的动态优化[J].山东工业技术,2016(6):142-142. 被引量：1
3张京钊,江涛,程凤菊.改进的自适应遗传算法在遥感图像分割中的应用[J].测绘科学,2008,33(S1):247-248. 被引量：1
4李康顺,李茂民,张文生.一种基于改进遗传算法的图像分割方法[J].计算机应用研究,2009,26(11):4364-4367. 被引量：25
5李畅,聂定远,刘东.马尔可夫决策在Web服务选择中的应用[J].高等函授学报（自然科学版）,2007,20(2):38-40.
6储毅,赵敏.基于马尔可夫决策的动态电源管理技术[J].电子科技大学学报,2007,36(3):521-523. 被引量：3
7CHENG Yuhu WANG Xuesong ZHANG Yiyang.A Bayesian Reinforcement Learning Algorithm Based on Abstract States for Elevator Group Scheduling Systems[J].Chinese Journal of Electronics,2010,19(3):394-398. 被引量：2
8Yin Hao,Zhu Guang\|xi,Li Xiao\|long,Zhu Yao\|ting,He Da\|an Electronic Engineering Department,Huazhong University of Science and Technology , Wuhan 430074,China.Development and Evaluation of a Distance Learning System Based on CSCW[J].Wuhan University Journal of Natural Sciences,2001,6(Z1):491-494. 被引量：2
9刘甜甜,贾智平,Edwin H. -M. Sha.嵌入式通信系统中基于动态多因素的马尔可夫决策路由[J].上海交通大学学报,2007,41(10):1607-1607.
10李向鹏.基于马尔可夫决策过程的无线传感器网络速率控制[J].计算机与现代化,2012(7):152-154. 被引量：1

Journal of Harbin Institute of Technology(New Series)

2004年第4期

浏览历史

内容加载中请稍等...

Reinforcement learning with partitioning function system

参考文献6

相关作者

相关机构

相关主题

浏览历史