Cooperative learning with joint state value approximation for multi-agent systems 被引量：1

Cooperative learning with joint state value approximation for multi-agent systems

导出

摘要 This paper relieves the ＇curse of dimensionality＇ problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others＇ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster leamin~ soeed comoared with friend-O learnin~ and indet^endent learning. This paper relieves the ＇curse of dimensionality＇ problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others＇ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster leamin~ soeed comoared with friend-O learnin~ and indet^endent learning.

作者 Xin CHEN Gang CHEN Weihua CAO Min WU

机构地区 School of Information Science and Engineering

出处《控制理论与应用（英文版）》 EI CSCD 2013年第2期149-155,共7页

基金 supported by National Nature Science Foundation of China(Nos.61074058,60874042) the Chinese Postdoctoral Science Foundation(No.200902483) the Specialized Research Fund for the Doctoral Program of Higher Education of China(No.20090162120068) the Central South University Innovation Project(No.2011ssxt221)

关键词 Multi-agent system Q-LEARNING Cooperative system Curse of dimensionality DECOMPOSITION Multi-agent system Q-learning Cooperative system Curse of dimensionality Decomposition

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献18

1G. Weiss. Multiagent Systems." A Modern Approach to Distributed Artificial Intelligence. Cambridge: MIT Press, 1999.
2N. Vlassis. A concise introduction to multiagent systems and distributed artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2007, 1 (1): 1 - 71.
3M. Wu, W. Cao, J. Peng, et al. Balanced reactive-deliberative architecture for multi-agent system for simulation league of RoboCup. International Journal of Control, Automation and Systems, 2009, 7(6): 945 - 955.
4K. Tumer, A. Agogino. Improving air traffic management with a learning multiagent system. IEEE Intelligent Systems, 2009, 24(1): 18-21.
5S. Proper, E Tadepalli. Solving multiagent assignment Markov decision processes. Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems. Richland: IFAAMAS, 2009:681 -688.
6J. R. Kok, M. T. J. Spaan, N. Vlassis. Non-communicative multi-robot coordination in dynamics environments. Robotics and Autonomous Systems, 2005, 50(2/3): 99 - 114.
7M. L. Littman. Friend-or-Foe Q-learning in general-sum games] Proceedings of the 18th International Conference on Maehin Learning. Williamstown: Morgan Kaufmann Press, 2001:322 - 328.I.
8X. Wang, T. Sandholm. Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proceedings of theAdvances Neural Information Processing Systems. Cambridge: MIT Press, 2002:1571 - 1578.
9R. I. Brafman, M. Tennenholtz. R-Max-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2002, 3(2): 213 - 231.
10L. Busoniu, R. Babuska, B. De Schutter. A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part C." Applications and Reviews, 2008, 38(2): 156- 172.

同被引文献5

1Yan, Jing, Guan, Xinping, Luo, Xiaoyuan, Tan, Fuxiao.Formation and obstacle avoidance control for multiagent systems[J].控制理论与应用（英文版）,2011,9(2):141-147. 被引量：4
2S.N.BALAKRISHNAN.Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems[J].控制理论与应用（英文版）,2011,9(3):370-380. 被引量：2
3Wei, Qinglai, Liu, Derong.Finite horizon optimal control of discrete-time nonlinear systems with unfixed initial state using adaptive dynamic programming[J].控制理论与应用（英文版）,2011,9(3):381-390. 被引量：2
4Laura RAY.Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning[J].控制理论与应用（英文版）,2011,9(3):440-450. 被引量：2
5Yao LU 1 , Yi GUO 1 , Zhaoyang DONG 2 (1.Department of Electrical and Computer Engineering, Stevens Institute of Technology, Castle Point, Hoboken, NJ07030, USA,2.Department of Electrical Engineering, Hong Kong Polytechnic University, Hong Kong, China).Multiagent flocking with formation in a constrained environment[J].控制理论与应用（英文版）,2010,8(2):151-159. 被引量：5

引证文献1

1Yao Zhang,Chaoxu Mu,Yong Zhang,Yanghe Feng.Heuristic dynamic programming-based learning control for discrete-time disturbed multi-agent systems[J].Control Theory and Technology,2021,19(3):339-353.

1一网更新微博[J].电脑爱好者（普及版）,2011(A01):216-216.
2黄海利,王晓喃.一种基于UDP的拥塞控制方案[J].电子技术应用,2013,39(9):109-111. 被引量：3
3张婷.项目教学法在计算机教学中的运用[J].新课程（下）,2011,0(6):104-104.
4吕品.计算机专业人才培养中需正视的问题[J].理工高教研究,2005,24(4):111-112. 被引量：2
5周文波,陈健,谢力,何晓波.Web数据挖掘技术在远程教育中的应用[J].防灾科技学院学报,2007,9(4):105-107. 被引量：2
6朱长城,梁平元.一种支持更新的有序XML文档编码方法[J].计算机工程与应用,2012,48(25):141-145.
7TAN Wenan,WEN Xiang,JIANG Chuanqun,DU Yi,HU Xiaoming.An Evaluation Model Integrating User Trust and Capability for Selection of Cooperative Learning Partners[J].Chinese Journal of Electronics,2012,21(1):42-46.
8李晓霖.解决Doraino邮件系统故障[J].网管员世界,2008(8):76-76.
9Zhang Ziyi.A Study on the Effects of Cooperative Learning in English Reading Class in Vocational Colleges[J].International English Education Research,2014(11):71-73.
10谷歌推出Friend Connect提供网络共享社交服务[J].中国传媒科技,2008(5):63-63.

控制理论与应用（英文版）

2013年第2期

浏览历史

内容加载中请稍等...

Cooperative learning with joint state value approximation for multi-agent systems 被引量：1

参考文献18

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史