多Agent MDPs中并行Rollout学习算法

Parallel rollout algorithms for multi-agent MDPs

下载PDF

导出

摘要文章在rollout算法基础上研究了在多Agent MDPs的学习问题.利用神经元动态规划逼近方法来降低其空间复杂度,从而减少算法"维数灾".由于Rollout算法具有很强的内在并行性,文中还分析了并行求解方法.通过多级仓库库存控制的仿真试验,验证了Rollout算法在多Agent学习中的有效性. The paper researches Rollout algorithms （RA） for multi-Agent Markov decision processes （MDPs） in the framework of performance potentials theory. Neuro-dynamic programming （NDP） is used to reduce ＂curse of dimensionality＂ of algorithms, Since to rolout algorithms has a very strong intrinsic parallelism,the parallelization method of RA is employed to reduce the time of running algorithms. Finally,an example of multi-level inventory control by using RA under the supply chain environment is provided. The result shows that rollout algorithms are confirmed to be valid in multi-Agent learning.

作者李豹

机构地区中国人民银行芜湖市中心支行

出处《安徽工程大学学报》 CAS 2014年第2期75-78,共4页 Journal of Anhui Polytechnic University

关键词 ROLLOUT算法神经元动态规划多AGENT学习性能势并行算法 rollout algorithms neuro-dynamic programming multi agent learning performance potentialsparallel algorithms

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献13

1Littman M.Markov games as a framework for multi-Agent reinforcement learning[C]//Proceedings of the Eleventh International Conference on Machine Learning.San.Francisco:Morgan Kaufmann Publishers,1994:157-163.
2Littman M.Friend or foe Q-learning in general-sum Markov games[C]//Proceedings of Eighteenth International Conference on Machine Learning.Williams,College,MA,San Mateo,CA:Morgan Kaufmann Publishers,2001:322-328.
3Hu J,Wellman M.Nash Q-learning for general-sum stochastic games[J].Machine Learning Research,2003,4:1039-1069.
4Greenwald A,Hall K.Correlated Q-learning[C]//Proceedings of the Twentieth International Conference on Machine Learning.Washington DC,USA:AAAI Press,2003:242-249.
5李豹,程文娟,周雷,唐昊.Rollout及其并行求解算法在多类商品库存控制中的应用[J].系统仿真学报,2007,19(17):3883-3887. 被引量：1
6Bertsekas D P.Dynamic Programming and Optimal Control,Vol.Ⅱ,4th Edition:Approximate Dynamic Programming[M].Belmont:MA,Athena Scientific,2012.
7Sutton R S,Barto A G.Reinforcement learning:an introduction[M].Cambridge:MA,MIT Press,1998.
8Bertsekas D P,Tsitsiklis J N,Wu C.Rollout algorithms for combinatorial optimization[J].Heuristics,1997,3:245-262.
9Bertsekas D P.Rollout Algorithms for Discrete Optimization:A Survey[C]//Handbook of Combinatorial Optimization.Berlin:springer,2005:2989-3014.
10Li X,Cao X R.Performance optimization of queueing systems with perturbation realization[J].European Journal of Operations Research,2012,218(2):293-304.

二级参考文献10

1代桂平,殷保群,王肖龙,奚宏生.受控M/G/1排队系统的性能优化及迭代算法[J].系统仿真学报,2004,16(8):1683-1685. 被引量：3
2唐昊,周雷,袁继彬.平均和折扣准则MDP基于TD(0)学习的统一NDP方法[J].控制理论与应用,2006,23(2):292-296. 被引量：5
3胡奇英，刘建庸．马尔可夫控制过程引论[M]．西安：西安电子科技大学出版社，2000．
4Bertsekas D P, Tsitsiklis J N.. Neuro-Dynamic Programming [M]. Belmont, MA: Athena Scientific, 1996.
5Bertsekas D P, Tsitsiklis J N, Wu C. Rollout algorithms for combinatorial optimization [J]. Heuristics (S1381-1237), 1997, 3(3): 245-262.
6Bertsekas D P. Differential training of rollout policies [C]//Proc. of the 35^th Allerton Conference on Communication, Control, and Computing. Allerton Park, Ⅲ, 1997.
7Cao X R, Chen H E Perturbation realization, potentials and Sensitivity analysis of Markov processes [J]. IEEE Trans. on Automatic Control (S0018-9286), 1997, 42(10): 1382-1393.
8Cao X R. Single sample path-based optimization of Markov chains [J]. Journai of Optimization Theory and Applications (S0022-3239), 1999, 100(3): 527-548.
9Cao X R. From perturbation analysis to Markov decision processes and reinforcement learning [J]. Discrete Event Dynamic Systems: Theory and Applications (S0924-6703), 2003, 13(1): 9-39.
10高旭东,殷保群,唐昊,奚宏生.Markov控制过程基于性能势仿真的并行优化[J].系统仿真学报,2003,15(11):1574-1576. 被引量：1

1李豹,程文娟,周雷,唐昊.Rollout及其并行求解算法在多类商品库存控制中的应用[J].系统仿真学报,2007,19(17):3883-3887. 被引量：1
2李琳娜.基于变学习率的多agent学习算法的研究[J].长春工程学院学报（自然科学版）,2009,10(4):81-83.
3黄以锋,景博,王春晖,窦雯.基于Rollout算法的冗余多故障诊断策略[J].计算机测量与控制,2014,22(11):3480-3482. 被引量：1
4金辉宇,于海斌.神经元动态规划综述[J].信息与控制,2001,30(4):343-347. 被引量：2
5黄以锋,景博,喻彪,李健君.基于概率阈的冗余多故障诊断策略[J].空军工程大学学报（自然科学版）,2014,15(5):1-5. 被引量：2
6孙煜,马力,刘松风.基于信息论的诊断策略生成算法研究[J].舰船电子工程,2010,30(9):165-168. 被引量：1
7赵海燕,曹健,徐文博.基于多Agent学习机制的服务组合[J].计算机工程与科学,2013,35(9):117-121.
8黄以锋,景博,穆举国.基于Rollout算法的模拟电路测点选择[J].自动化仪表,2012,33(2):5-8. 被引量：4
9黄以锋,景博.基于Rollout算法的多值属性系统诊断策略[J].控制与决策,2011,26(8):1269-1272. 被引量：15
10王颖,朱顺痣,许威,缪克华,李茂青.基于神经元动态规划的可重入生产系统调度的仿真框架[J].信息与控制,2007,36(2):218-223. 被引量：2

安徽工程大学学报

2014年第2期

浏览历史

内容加载中请稍等...

多Agent MDPs中并行Rollout学习算法

参考文献13

二级参考文献10

相关作者

相关机构

相关主题

浏览历史