期刊文献+

Cooperative learning with joint state value approximation for multi-agent systems 被引量:1

Cooperative learning with joint state value approximation for multi-agent systems
原文传递
导出
摘要 This paper relieves the 'curse of dimensionality' problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others' actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster leamin~ soeed comoared with friend-O learnin~ and indet^endent learning. This paper relieves the 'curse of dimensionality' problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others' actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster leamin~ soeed comoared with friend-O learnin~ and indet^endent learning.
出处 《控制理论与应用(英文版)》 EI CSCD 2013年第2期149-155,共7页
基金 supported by National Nature Science Foundation of China(Nos.61074058,60874042) the Chinese Postdoctoral Science Foundation(No.200902483) the Specialized Research Fund for the Doctoral Program of Higher Education of China(No.20090162120068) the Central South University Innovation Project(No.2011ssxt221)
关键词 Multi-agent system Q-LEARNING Cooperative system Curse of dimensionality DECOMPOSITION Multi-agent system Q-learning Cooperative system Curse of dimensionality Decomposition
  • 相关文献

参考文献18

  • 1G. Weiss. Multiagent Systems." A Modern Approach to Distributed Artificial Intelligence. Cambridge: MIT Press, 1999.
  • 2N. Vlassis. A concise introduction to multiagent systems and distributed artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2007, 1 (1): 1 - 71.
  • 3M. Wu, W. Cao, J. Peng, et al. Balanced reactive-deliberative architecture for multi-agent system for simulation league of RoboCup. International Journal of Control, Automation and Systems, 2009, 7(6): 945 - 955.
  • 4K. Tumer, A. Agogino. Improving air traffic management with a learning multiagent system. IEEE Intelligent Systems, 2009, 24(1): 18-21.
  • 5S. Proper, E Tadepalli. Solving multiagent assignment Markov decision processes. Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems. Richland: IFAAMAS, 2009:681 -688.
  • 6J. R. Kok, M. T. J. Spaan, N. Vlassis. Non-communicative multi-robot coordination in dynamics environments. Robotics and Autonomous Systems, 2005, 50(2/3): 99 - 114.
  • 7M. L. Littman. Friend-or-Foe Q-learning in general-sum games] Proceedings of the 18th International Conference on Maehin Learning. Williamstown: Morgan Kaufmann Press, 2001:322 - 328.I.
  • 8X. Wang, T. Sandholm. Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proceedings of theAdvances Neural Information Processing Systems. Cambridge: MIT Press, 2002:1571 - 1578.
  • 9R. I. Brafman, M. Tennenholtz. R-Max-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2002, 3(2): 213 - 231.
  • 10L. Busoniu, R. Babuska, B. De Schutter. A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part C." Applications and Reviews, 2008, 38(2): 156- 172.

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部