期刊文献+

共享经验的多主体强化学习研究 被引量:4

Research on Multi-agent Reinforcement Learning with Sharing Experience
下载PDF
导出
摘要 合作多主体强化学习的关键问题在于如何提高强化学习的学习效率。在追捕问题的基础上,该文提出一种共享经验的多主体强化学习方法。通过建立合适的状态空间使猎人共享学习经验,根据追捕问题的对称性压缩状态空间。实验结果表明,共享状态空间能够加快多主体强化学习的过程,状态空间越小,Q学习算法收敛越快。 How to improve the efficiency of reinforcement learning is the key problem of reinforcement leaning with multi-agent collaboration. This paper proposes a method of multi-agent reinforcement learning with sharing experience based on the research to pursuit problem. By applying this method the hunters can share the learning experience through constructing the appropriate state space. It further compresses the state space according to the symmetry character of pursuit problem. Experimental results show that sharing state space can expedite the process of multi-agent reinforcement learning. The smaller the state space is, the faster Q learning algorithm convergence will be.
作者 焦殿科 石川
出处 《计算机工程》 CAS CSCD 北大核心 2008年第11期219-221,共3页 Computer Engineering
关键词 合作多主体 强化学习 Q学习算法 状态空间 multi-agent collaboration reinforcement learning Q learning algorithm state space
  • 相关文献

参考文献6

  • 1MITCHELL TM.机器学习[M].曾华军,张银奎.北京:机械工业出版社,2003.
  • 2Nitschke G. Emergence of Cooperation in a Pursuit-evasion Game[C]//Proc. of the 18th International Joint Conference on Artificial Intelligence. Acapulco, Mexico: [s. n.], 2003: 639-646.
  • 3Tan M. Multi-agent Reinforcement Learning: Independent vs Cooperative Agents[C]//Proc. of the 10th International Conference on Machine Learning. Amherst, MA: [s. n.], 1993: 330-337.
  • 4Nunes L. Cooperative Learning Using Advice Exchange[M]. Berlin, Heidelberg, Germany: Springer-Verlag, 2003: 33-48.
  • 5王长缨,尹晓虎,鲍翊平,姚莉.一种共享经验元组的多agent协同强化学习算法[J].模式识别与人工智能,2005,18(2):234-239. 被引量:4
  • 6蔡庆生,张波.一种基于Agent团队的强化学习模型与应用研究[J].计算机研究与发展,2000,37(9):1087-1093. 被引量:31

二级参考文献7

  • 1Kaelbling L P, Littman M L, Moore A W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996, 4:237-285.
  • 2Watkins C, Dayan P. Q-Learning. Machine Learning, 1992, 8:279-292.
  • 3Nitschke G. Emergence of Cooperation in a Pursuit-Evasion Game. In: Proc of the 18th International Joint Conference on Artificial Intelligence. Acapulco, Mexico, 2003, 639-646.
  • 4Tan M. Multi - Agent Reinforcement Learning: Independentvs.Cooperative Agents. In: Proc of the 10th International Conference on Machine Learning. Amherst, USA, 1993, 330-337.
  • 5Hu J L, Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research, 2003,4; 1-30.
  • 6Ribeiro C. Reinforcement Learning Agents.Artificial Intelligence Review, 2002, 17(3): 223-250.
  • 7蔡庆生,张波.一种基于Agent团队的强化学习模型与应用研究[J].计算机研究与发展,2000,37(9):1087-1093. 被引量:31

共引文献62

同被引文献20

  • 1王长缨,尹晓虎,鲍翊平,姚莉.一种共享经验元组的多agent协同强化学习算法[J].模式识别与人工智能,2005,18(2):234-239. 被引量:4
  • 2Makino T, Aihara K. Multi-Agent Reinforcement Learning Algorithm to Handle Beliefs of Other Agents' Policies and Embedded Beliefs[C]//Proc. of AAMAS'06. Hakodate, Japan: [s. n.], 2006.
  • 3Stone P, Sutton R S, Kuhlmann G. Reinforcement Learning for RoboCup Soccer Keepaway[J]. International Society for Adaptive Behavior, 2005, 13(3): 165-188.
  • 4Marthi B. Automatic Shaping and Decomposition of Reward Functions[C]//Proceedings of the 24th International Conference on Machine Learning. Corvallis, USA: [s. n.], 2007.
  • 5Torrey L, Shavlik J, Walker T, etal. Skill Acquisition via Transfer Learning and Advice Taking[M]. Berlin, Germany: Springer, 2006: 425-436.
  • 6Bianchi R A C, Ribeiro C H C, Costa A H R. Heuristically Accelerated Q-learning: A New Approach to Speed Up Reinforcement Learning[J]. Lecture Notes in Artificial Intelligence, 2004, 3171: 245-254.
  • 7Sutton S. Learing to predict by the methods of temporal difference[J]. Machine Learing, 1998(3) : 9-44.
  • 8Watkins C J C H, Dayan P. Technical note: Q-learing[J]. Machine Learing, 1992,8(3/4) :279-292.
  • 9Ito K, Imoto Y, Taguchi H, et al. A study of reinforcement learning with knowledge sharing[C]//Proceedings of the 2004 IEEE International Conference on Robotics and Biomimetics. Japan: Okayama University Digital Information Repository, 2004 : 175-180.
  • 10沈晶,程晓北,刘海波,顾国昌,张国印.动态环境中的分层强化学习[J].控制理论与应用,2008,25(1):71-74. 被引量:5

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部