期刊文献+

Reinforcement learning with partitioning function system

Reinforcement learning with partitioning function system
下载PDF
导出
摘要 The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforcement learning system with partitioning function (RLWPF) is established, in which state-space is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large state-space. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinary elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF. The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforcement learning system with partitioning function (RLWPF) is established, in which state-space is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large state-space. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinary elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF.
出处 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2004年第4期377-381,共5页 哈尔滨工业大学学报(英文版)
基金 SponsoredbytheNationalNaturalScienceFoundationofChina(GrantNo .6 9975 0 1 3) .
关键词 多AGENT系统 分割函数 强化学习 半马尔可夫决策 分割模数 multi-agent systems partitioning reinforcement learning elevator
  • 相关文献

参考文献6

  • 1Satinder Singh,Tommi Jaakkola,Michael L. Littman,Csaba Szepesvári.Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms[J].Machine Learning.2000(3)
  • 2SHIZW.AgentandItsApplication[]..2000
  • 3ZHUCM,HONGZY,ZHANGHQ.ElevatorandEscala tor[M ][]..1995
  • 4BAOG,CASSANDRASCG,DJAFERIS ,etal.ElevatorDispatchersforDownPeakTraffic[].ECEDepartmentTechnicalReport.1994
  • 5Sutton R S,Barto A G.Reinforcement learning:Anintroduction[]..1998
  • 6CRITES R H,BARTO A G.Improving elevator performance using reinforcement learning[].Advances in Neural Information Processing Systems.1996

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部