期刊文献+

多Agent MDPs中并行Rollout学习算法

Parallel rollout algorithms for multi-agent MDPs
下载PDF
导出
摘要 文章在rollout算法基础上研究了在多Agent MDPs的学习问题.利用神经元动态规划逼近方法来降低其空间复杂度,从而减少算法"维数灾".由于Rollout算法具有很强的内在并行性,文中还分析了并行求解方法.通过多级仓库库存控制的仿真试验,验证了Rollout算法在多Agent学习中的有效性. The paper researches Rollout algorithms (RA) for multi-Agent Markov decision processes (MDPs) in the framework of performance potentials theory. Neuro-dynamic programming (NDP) is used to reduce "curse of dimensionality" of algorithms, Since to rolout algorithms has a very strong intrinsic parallelism,the parallelization method of RA is employed to reduce the time of running algorithms. Finally,an example of multi-level inventory control by using RA under the supply chain environment is provided. The result shows that rollout algorithms are confirmed to be valid in multi-Agent learning.
作者 李豹
出处 《安徽工程大学学报》 CAS 2014年第2期75-78,共4页 Journal of Anhui Polytechnic University
关键词 ROLLOUT算法 神经元动态规划 多AGENT学习 性能势 并行算法 rollout algorithms neuro-dynamic programming multi agent learning performance potentialsparallel algorithms
  • 相关文献

参考文献13

  • 1Littman M.Markov games as a framework for multi-Agent reinforcement learning[C]//Proceedings of the Eleventh International Conference on Machine Learning.San.Francisco:Morgan Kaufmann Publishers,1994:157-163.
  • 2Littman M.Friend or foe Q-learning in general-sum Markov games[C]//Proceedings of Eighteenth International Conference on Machine Learning.Williams,College,MA,San Mateo,CA:Morgan Kaufmann Publishers,2001:322-328.
  • 3Hu J,Wellman M.Nash Q-learning for general-sum stochastic games[J].Machine Learning Research,2003,4:1039-1069.
  • 4Greenwald A,Hall K.Correlated Q-learning[C]//Proceedings of the Twentieth International Conference on Machine Learning.Washington DC,USA:AAAI Press,2003:242-249.
  • 5李豹,程文娟,周雷,唐昊.Rollout及其并行求解算法在多类商品库存控制中的应用[J].系统仿真学报,2007,19(17):3883-3887. 被引量:1
  • 6Bertsekas D P.Dynamic Programming and Optimal Control,Vol.Ⅱ,4th Edition:Approximate Dynamic Programming[M].Belmont:MA,Athena Scientific,2012.
  • 7Sutton R S,Barto A G.Reinforcement learning:an introduction[M].Cambridge:MA,MIT Press,1998.
  • 8Bertsekas D P,Tsitsiklis J N,Wu C.Rollout algorithms for combinatorial optimization[J].Heuristics,1997,3:245-262.
  • 9Bertsekas D P.Rollout Algorithms for Discrete Optimization:A Survey[C]//Handbook of Combinatorial Optimization.Berlin:springer,2005:2989-3014.
  • 10Li X,Cao X R.Performance optimization of queueing systems with perturbation realization[J].European Journal of Operations Research,2012,218(2):293-304.

二级参考文献10

  • 1代桂平,殷保群,王肖龙,奚宏生.受控M/G/1排队系统的性能优化及迭代算法[J].系统仿真学报,2004,16(8):1683-1685. 被引量:3
  • 2唐昊,周雷,袁继彬.平均和折扣准则MDP基于TD(0)学习的统一NDP方法[J].控制理论与应用,2006,23(2):292-296. 被引量:5
  • 3胡奇英,刘建庸.马尔可夫控制过程引论[M].西安:西安电子科技大学出版社,2000.
  • 4Bertsekas D P, Tsitsiklis J N.. Neuro-Dynamic Programming [M]. Belmont, MA: Athena Scientific, 1996.
  • 5Bertsekas D P, Tsitsiklis J N, Wu C. Rollout algorithms for combinatorial optimization [J]. Heuristics (S1381-1237), 1997, 3(3): 245-262.
  • 6Bertsekas D P. Differential training of rollout policies [C]//Proc. of the 35^th Allerton Conference on Communication, Control, and Computing. Allerton Park, Ⅲ, 1997.
  • 7Cao X R, Chen H E Perturbation realization, potentials and Sensitivity analysis of Markov processes [J]. IEEE Trans. on Automatic Control (S0018-9286), 1997, 42(10): 1382-1393.
  • 8Cao X R. Single sample path-based optimization of Markov chains [J]. Journai of Optimization Theory and Applications (S0022-3239), 1999, 100(3): 527-548.
  • 9Cao X R. From perturbation analysis to Markov decision processes and reinforcement learning [J]. Discrete Event Dynamic Systems: Theory and Applications (S0924-6703), 2003, 13(1): 9-39.
  • 10高旭东,殷保群,唐昊,奚宏生.Markov控制过程基于性能势仿真的并行优化[J].系统仿真学报,2003,15(11):1574-1576. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部