多智能体的增强学习及其在RoboCup中的应用

Reinforcement learning for Multi-Agents Systems and its application in RoboCup

下载PDF

导出

摘要针对非确定马尔可夫环境下的多智能体系统,提出了多智能体Q学习模型和算法。算法中通过对联合动作的统计来学习其它智能体的行为策略,并利用智能体策略向量的全概率分布保证了对联合最优动作的选择。在实验中,成功实现了智能体的决策,提高了AFU队的整体的对抗能力,证明了算法的有效性和可行性。 Due to the presence of other agents,the environment of Multi-Agent Systems（MAS） cannot be simply treated as Markov Decision Processes （MDPs）.The current reinforcement learning which are based on MDPs must be reformed before it can be applicable to MAS.Based on an agent＇s independent learning ability,this paper proposes a novel Q-learning algorithm for MAS-an agent learning other agents action policies through observing the joint action.The politicies of other agents are expressed as action probability distribution matrixes.A concise and yet useful updating method for the matrixes is proposed.The full joint probability of distribution matrixes guarantees the learning agent to choose its optimal action.In experiment,the implemention of the agent and the enhancement of AFU shows that the approach is valid and efficient.

作者刘国栋杨宝庆

机构地区江南大学控制科学与工程研究中心

出处《计算机工程与应用》 CSCD 北大核心 2008年第23期46-48,共3页 Computer Engineering and Applications

关键词多智能体增强学习机器人世界杯足球锦标赛 Multi-Agents Systems （MAS） reinforcement learning Robot World Cup （RoboCup）

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] TP242.6 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献1

1郭锐,吴敏,彭军,彭姣,曹卫华.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4):367-372. 被引量：13

二级参考文献12

1Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4(2):237～285
2Sandip S C.Adaption,coevolution and learning in multiagent systems.In:Proceedings of AAAI Spring Symposium,AAAI Technical Report SS-96-01,AAAI,1996.57～62
3Weiss G,Dillenbourg P.What is multi in multiagent learning? Collaborative Learning,Cognitive and Computational Approaches.Amsterdam,Holland:Pergamon Press,1998.64～80
4Narendra P,Sandip S,Maria G.Shared memory based cooperative coevolution.In:Proceedings of IEEE International Conference on Evolutionary Computation,IEEE,1998.570～574
5Littman M L.Markov games as a framework for multiagent reinforcement learning.In:Proceedings of the 11th Interna tional Conference on Machine learning,Morgan Kaufmann,1994.157163
6Littman M L.Friend-or-foe:Q-learning in general-sum games.In:Proceedings of the 18th International Conference on Machine Learning,Morgan Kaufmann,2001.322～328
7Hu J,Wellman M P.Nash Q-Learning for General-Sum stochastic games.Journal of Machine Learning,2003,4:1039～1069
8Mitchell T M.Machine Learning.USA:McGraw-Hill Companics Inc.1997,367～387
9Watkins C J C H,Dayan P.Technical note Q-learning.Journal of Machine Learning,1992,(8):279～292
10Haussler D.Quantifying inductive bias:AI learning algorithms and valiant's learning framework.Artificial Intelligence,1988,36(2):177～221

共引文献12

1张捍东,吴玉秀,岑豫皖.多机器人合作与协调研究进展[J].计算机工程与应用,2008,44(24):238-241. 被引量：4
2王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219. 被引量：20
3柴毅,利节,王嘉骐.基于后悔值的多蚁协作关联强化学习模型[J].系统工程,2010,28(4):64-67. 被引量：1
4陈玉明,张广明,赵英凯.基于混合Q学习的多Agent系统[J].制造业自动化,2010,32(9):61-63.
5柯文德,朴松昊,彭志平,蔡则苏,苑全德.基于π演算的足球机器人协作Q学习方法[J].计算机应用,2011,31(3):654-656. 被引量：4
6吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
7杨月全,韩飞,金露,倪春波,曹志强,张天平.基于局部加权k近邻的多机器人系统异步互增强学习[J].东南大学学报（自然科学版）,2012,42(A01):208-211. 被引量：2
8连志刚,高叶军,焦斌.基于学习算法的离散型制造业生产能力平衡技术[J].安徽大学学报（自然科学版）,2013,37(2):19-24.
9柯文德,洪炳镕,崔刚,蔡则苏.一种基于π-MaxQ学习的多机器人协作方法[J].智能计算机与应用,2013,3(3):14-17. 被引量：2
10刘全,李瑾,傅启明,崔志明,伏玉琛.一种最大集合期望损失的多目标Sarsa(λ)算法[J].电子学报,2013,41(8):1469-1473. 被引量：3

1郭锐,吴敏,彭军,彭姣,曹卫华.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4):367-372. 被引量：13
2张家旺,韩光胜,张伟.Q学习算法在RoboCup带球中的应用[J].系统仿真技术,2005,1(2):84-87. 被引量：3
3尹国成,张德干,朱红艳,赵海.基于熵模型的自适应信息融合方法[J].东北大学学报（自然科学版）,2002,23(3):232-235. 被引量：5
4周清艳.计算机仿真机器人世界杯足球锦标赛攻防战术的研究[J].中国科技信息,2005(21A):61-61. 被引量：1
5严浙平,李锋,黄宇峰.多智能体Q学习在多AUV协调中的应用研究[J].应用科技,2008,35(1):57-60. 被引量：4
6张家旺,韩光胜,张伟.C5.0算法在RoboCup传球训练中的应用研究[J].计算机仿真,2006,23(4):132-134. 被引量：11
7黄朱顺.分公司网络面临的安全威胁和防范措施[J].安徽冶金,2012(2):45-47.
8孙文杰,魏现杰,毋小勇,刘俊邦.作战仿真中电子对抗的系统动力学模型研究[J].计算机仿真,2004,21(5):8-10. 被引量：2
9夏卫峰,费敏锐.遗传算法及其在RoboCup中的应用[J].系统仿真学报,2002,14(6):707-709. 被引量：2
10李艳,杨习贝,杨静宇.优势关系粗糙集在RoboCup中的决策分析[J].计算机科学,2009,36(4):232-234. 被引量：3

计算机工程与应用

2008年第23期

浏览历史

内容加载中请稍等...

多智能体的增强学习及其在RoboCup中的应用

参考文献1

二级参考文献12

共引文献12

相关作者

相关机构

相关主题

浏览历史