期刊文献+

基于变学习率的多agent学习算法的研究

The study of multi-agent learning algorithm based on variable learning rate
下载PDF
导出
摘要 对在动态学习的环境中的IGA算法做了研究,改进了梯度方向上的步长恒定不变的不足,引入了变学习率,并介绍了调节学习率的方法——WoLF原则,加速其收敛。最后根据该方法,对Q学习算法做了改进,并通过仿真试验证明了算法的有效性。 This paper studied the IGA algorithm in a dynamic learning environment,and improved the insufficiency of step constantly invariable in the gradient direction.The variable learning rate and the WoLF principle to adjust learning rate were introduced in order to accelerate its convergence.Finally the Q learning algorithm was improved based on this method and the validity of the algorithm was proved through the simulation testing.
作者 李琳娜
出处 《长春工程学院学报(自然科学版)》 2009年第4期81-83,共3页 Journal of Changchun Institute of Technology:Natural Sciences Edition
关键词 多AGENT学习 变学习率 Q学习 multi-agent study variable learning rate Q-learning
  • 相关文献

参考文献6

  • 1宋梅萍,顾国昌,张国印.随机博弈框架下的多agent强化学习方法综述[J].控制与决策,2005,20(10):1081-1090. 被引量:13
  • 2L C Baird,A W Moore. Gradientdescentforgeneralre- inforcementlearning, in: AdvancesinNeural InformationProcessingSystems [ J ]. MITPress, Cambridge, MA, 1999, ( 11 ) : 273--389.
  • 3O L , Mangasarian, H Stone. Two - person nonzero-sum games and quadratic programming[J].J. Math. Anal. Appl. ,1964,(09) :348--355.
  • 4M Bowling, M Veloso. Variable learning rate and the convergence of gradient dynamics [ A ]. WiUiamstown. In Proc. 18th International Conference on Machine Learning [ C ]. MA, USA: WiUiamstown,2001.27--34.
  • 5S Singh, M Kearns, Y Mansour. Nash convergence of gradient dynamics in general-sum games[ A]. Morgan Kaufmann. In Proc. 16th Conference on Uncertainty in Artificial Intelligence [ C ]. San Francisco, CA, USA : Publishers Inc. San Mateo,CA ,2000.541--548.
  • 6Michael L Littman, Markov games as a framework for multi-agent reinforcement learning[A]. Morgan Kanfann. In Proc. 11th International Conference on Machine Learning [C ]. New Brunswick: N J, Motgan Kaufmann, an Mateo, CA, 1994. 157--163.

二级参考文献65

  • 1Justin Lenzo, Todd Sarver. Correlated Equilibrium in Evolutionary Models with Subpopulations [A]. The 15th Annual lnt Conf on Game Theory[C]. NY, 2004.
  • 2Eilon Solan, Nicolas Vieille. Correlated Equilibrium in Stochastic Games[J]. Games and Economic Behavior,2002, 38(2): 362-399.
  • 3Watkins P Dayan. Q-Learning[J]. Machine Learning,1992, 8(3); 279-292.
  • 4Michael L Littman, Szepesvari C. A Generalized Reinforcement Learning Model: Convergence and Applications[J]. Proc of the 13th Int Conf on Machine Learning[C]. Bari, 1996: 310-318.
  • 5Michael Bowling. Convergence Problems of Generalsum Multiagent Reinforcement Learning[A]. Proc of the 17th Int Conf on Machine Learning[C]. Stanford,2000 : 89-94.
  • 6Bowling M, Veloso M. Rational and Convergent Learning in Stochastic Games[A]. Proc of the 17th Int Joint Conf on Artificial Intelligence[C]. Seattle,2001 : 1021-1026.
  • 7Singh S, Kearns M, Mansour Y. Nash Convergence of Gradient Dynamics in General-sum Games [A].Proc of the 16th Conf on Uncertainty in Artificial Intelligence[C]. Stanford: Morgan Kaufman, 2000:541-548.
  • 8Fudenberg D, Levine D K. The Theory of Learning in Games[M]. Boston: MIT Press, 1999.
  • 9Michael L Littman, Peter Stone. A Polynomial-time Nash Equilibrium Algorithm for Repeated Games[A].ACM Conf on Electronic Commerce[C]. San Diego,2003: 48-54.
  • 10Vincent Conitzer, Tuomas Sandholm. Complexity Results about Nash equilibria[A]. Proc of the 18th Int Joint Conf on Artificial Intelligence[C]. Acapulco,2003: 765-771.

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部