基于Meta平衡的多Agent Q学习算法研究被引量：1

Research on Multi-agent Q Learning Algorithm Based on Meta Equilibrium

下载PDF

导出

摘要多Agent强化学习算法的研究一直以来大多都是针对于合作策略,而NashQ算法的提出对非合作策略的研究无疑是一个重要贡献。针对在多Agent系统中,Nash平衡无法确保求得的解是Pareto最优解及其计算复杂度较高的问题,提出了基于Meta平衡的MetaQ算法。与NashQ算法不同,MetaQ算法通过对自身行为的预处理以及对其它Agent行为的预测来获取共同行为的最优策略。最后通过研究及气候合作策略游戏实验,证明了MetaQ算法在解决非合作策略的问题中有着很好的理论解释和实验性能。 Multi-agent reinforcement learning algorithms aim at cooperation strategy,while NashQ is frequently mentioned as a pivotal algorithm to the study of non-cooperative strategies.In multi-agent systems,Nash equilibrium can not ensure the solutions obtained Pareto optimal,besides,the algorithm with high computation complexity.MetaQ algorithm was proposed in this paper.It is different from NashQ that MetaQ finds out the optimal solution by the pre-treatment of its own behavior and the prediction of the others behavior.In the end,a game-climate cooperation strategy was used in this paper,and the results shows that MetaQ algorithm,with impressive performance,is fit for non-cooperative problem.

作者王万良濮约庆赵燕伟

机构地区浙江工业大学计算机科学与技术学院浙江工业大学特种装备制造与先进加工技术教育部重点实验室

出处《计算机科学》 CSCD 北大核心 2012年第B06期261-264,共4页 Computer Science

基金国家自然科学基金项目(60874074) 浙江省重大科技专项(2009C11039)资助

关键词强化学习 Meta平衡 NashQ 多AGENT系统 Reinforcement learning； Meta equilibrium； NashQ； Multi-agent system

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献15

1Hu Jun-ling,Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm[C]//Proceedings of the Fifteenth International Conference on Machine Learning. 1998: 242-250.
2Wang Hao, Gao Yang, Chen Xing-guo. RL-DOT: A Reinforcement Learning NPC Team for Playing Domination Games[J].IEEE Transactions on Computational Intelligence and AI in Games, 2010,2 (1): 17-26.
3Littman M L, Friend-or-foe q-learning in general-sum games[C]// Proceedings of the Eighteenth International Conference on Machine Learning. Williams College: Morgan Kaufman, 2001 : 322-328.
4Greenwald A, Hall K, Serrano tC Correlated-q learning[C]//Proceedings of the Twentieth International Conference on. Washington DC, 2003 : 242-249.
5赵凤强,徐毅,李广强.基于岛屿群体模型的多目标演化算法研究[J].计算机科学,2010,37(12):190-192. 被引量：1
6宋梅萍,顾国昌,张国印,刘海波.一般和博弈中的合作多agent学习[J].控制理论与应用,2007,24(2):317-321. 被引量：7
7HuJun-ling, WetlmanMP, NashQ-learning for general-sum stochastic games[J]. Journal of Machine Learning Research, 2003, 4(11):1039-1069.
8Vassiliades V,Cleanthous A,Christodoulou C. Mutltiagent Reinforcement Learning: Spiking and Nonspiking Agents in the iterated Prisoner' s Dilemma [J]. IEEE Transactions on Neural Networks, 2011,22(4) : 639-653.
9Aumann R J, Hart S. Computing equilibria {or two-person games. Handbook of Game Theory with Economic Applications [R]. Amsterdam: Elsevier, 2002.
10Murty K G. Computational complexity of complementary pivot methods[C]//Mathematical Programming Study 7 Complementarily and Fixed Point Problems. Amsterdam: North-Holland Publishing Co, 1978:61-73.

二级参考文献15

1刘旭红,刘玉树,张国英,阎光伟.多目标优化算法NSGA-II的改进[J].计算机工程与应用,2005,41(15):73-75. 被引量：22
2刘德铭黄振高.对策论及其应用[M].长沙:国防科技大学出版社,1994..
3LITTMAN M L.Markov games as a framework for multi-agent reinforcement learning[C]//Proc of the Eleventh Int Conf on Machine Learning.New Brunswick,NJ San Mateo,CA:Morgan Kaufmann Publishers,1994:157-163.
4HU Junling,LITTMAN M L.Multiagent reinforcement learning:theoretical framework and an algorithm[C]//Proc of the Fifteenth Int Conf on Machine Learning.Madison,Wisconson,San Mateo,CA:Morgan Kaufmann Publishers,1998:242-250.
5HU Junling,WELLMAN M P.Nash Q-Learning for general-sum stochastic games[J].J of Machine Learning Research,2003,4(6):1039-1069.
6LITTMAN M L.Friend or foe Q-learning in general-sum markov games[C] //Proc of the Int Conf on Machine Learning.Williams Colledge,MA,San Mateo,CA:Morgan Kaufmann Publishers,2001(a):322-328.
7GREENWALD A,HALL K.Correlated Q-learning[C]//Proc of the Twentieth Int Conf on Machine Learning.Washington DC,USA:AAAI Press,2003:242-249.
8LITTMAN M L,SZEPESVARI C.A generalized reinforcement learning model:Convergence and applications[C]//Proc of the 13th Int Conf on Machine Learning.Bari,Italy,San Mateo,CA:Morgan Kaufmann Publishers,1996:310-318.
9DEB K.Multi-objective evolutionary algorithms:Introducing bias among pareto-optimal solutions,KanGAL report 99002[R].Kanpur,India:Indian Institute of Technology,1999.
10GEANAKOPLOS J.Common knowledge[J].J of Economic Perspectives,1992,6(4):53-82.

共引文献6

1张捍东,暴伟,王丽华.基于博弈强化学习的多智能体协作行为寻优[J].自动化与仪表,2009,24(1):1-4. 被引量：3
2郑延斌,牛丽平.基于随机对策的团队CGA学习[J].计算机工程与应用,2009,45(23):52-54.
3吴士泓,李德华,潘莹.集体理性约束的Agent协作强化学习[J].计算机工程与应用,2010,46(17):8-10. 被引量：1
4黄付亮,张荣国,陈大川,刘焜.基于联合博弈的多Agent学习[J].计算机与数字工程,2011,39(6):21-24.
5范思遐,周奇才,熊肖磊,赵炯.一种动态博弈的多agent合作机制模型[J].东北大学学报（自然科学版）,2015,36(1):114-118. 被引量：7
6郑晓东,陈皓勇,陈亦平,莫维科.电力系统二级电压控制的合作博弈模型[J].电力系统自动化,2016,40(15):32-38. 被引量：2

同被引文献13

1芦蓉,沈毅.一种改进的二维直方图的图像阈值分割方法[J].系统工程与电子技术,2004,26(10):1487-1490. 被引量：18
2谭优,王泽勇.图像阈值分割算法实用技术研究与比较[J].微计算机信息,2007(24):298-299. 被引量：46
3Richard S Sutton, Andrew G Barto. Reinforcement Learn- ing: An Introduction[ M]. MIT Press, 1998.
4Leslie Pack Kaelbling, Michael L Littman,Andrew W Moore. Reinforcement learning: A survey[J]. Journal of Artificial In- telligence Research, 1996,4( 1 ) :237-255.
5左国玉,张红卫,韩光胜.基于多智能体强化学习的新强化函数设计[J].控制工程,2009,16(2):239-242. 被引量：4
6朱颢东,钟勇.一种改进的模拟退火算法[J].计算机技术与发展,2009,19(6):32-35. 被引量：84
7林正春,王知衍,张艳青.最优进化图像阈值分割算法[J].计算机辅助设计与图形学学报,2010,22(7):1201-1206. 被引量：27
8姚明海,瞿心昱,李佳鹤,顾勤龙,汤丽平.基于ART2的Q学习算法研究[J].控制与决策,2011,26(2):227-232. 被引量：3
9张蕊,严壮志,刘玮.图像修复的格子波尔兹曼方法[J].电子测量技术,2011,34(3):46-48. 被引量：6
10杨恢先,王绪四,谢鹏鹤,冷爱莲,彭友.改进阈值与尺度间相关的小波红外图像去噪[J].自动化学报,2011,37(10):1167-1174. 被引量：70

引证文献1

1陈鹏.一种基于Q学习的图像阈值确定方法[J].计算机与现代化,2013(6):113-115. 被引量：1

二级引证文献1

1宋朋燃,赵晓明,刘策.黄土微结构SEM图像分析中的影响因素研究[J].城市地质,2024,19(2):218-224.

1王皓,高阳.元博弈平衡和多Agent强化学习的MetaQ算法[J].计算机研究与发展,2006,43(z1):137-141. 被引量：2
2魏宝石.编程型面向对象游戏实验软件实现[J].黑龙江科技信息,2012(2):105-105.
3梁树杰,鲁恩名.基于协同进化算法的网络控制系统性能与安全性最优折中技术研究[J].计算机应用研究,2015,32(3):855-859.
4李治军,姜守旭.一种缩短下载时间优先的自适应BitTorrent激励协议[J].计算机学报,2012,35(7):1498-1509.
5岳晓宁,井元伟,王竞波.理想状态下网络非线性交叉干扰激励价控策略[J].控制与决策,2007,22(1):16-20.
6孟祥萍,王圣镔,王欣欣.基于蚁群算法和轮盘算法的多Agent Q学习[J].计算机工程与应用,2009,45(16):60-62. 被引量：5
7李静梅,尤晓非,韩启龙.基于任务复制的多关键路径任务调度算法[J].计算机工程与设计,2014,35(5):1639-1645. 被引量：6
8Painsolo.计算机病毒知识介绍(三)[J].计算机安全,2001(5):73-76.
9孟祥萍,王圣镔,王欣欣.多Agent Q学习几点问题的研究及改进[J].计算机工程与设计,2009,30(9):2274-2276. 被引量：5
10袁晓梅.Matlab环境下《信号与系统》虚拟实验的开发[J].天津职业大学学报,2005,14(2):26-28. 被引量：2

计算机科学

2012年第B06期

浏览历史

内容加载中请稍等...

基于Meta平衡的多Agent Q学习算法研究被引量：1

参考文献15

二级参考文献15

共引文献6

同被引文献13

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于Meta平衡的多Agent Q学习算法研究 被引量：1

参考文献15

二级参考文献15

共引文献6

同被引文献13

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于Meta平衡的多Agent Q学习算法研究被引量：1