期刊文献+

对称协调博弈问题的多智能体强化学习 被引量:2

Multi-agents reinforcement learning for symmetrical coordination
下载PDF
导出
摘要 针对多机器人协调问题,利用协调博弈中智能体策略相似性,提出智能体的高阶信念修正模型和学习方法PEL,使智能体站在对手角度进行换位推理,进而根据信念修正将客观观察行为和主观信念推理结合起来。证明了信念修正模型的推理置信度只在0和1两个值上调整即可协调成功。以多机器人避碰为实验背景进行仿真,表明算法比现有方法能够取得更好的协调性能。 Considering the problem of robots coordination games,the paper puts forward an agents' belief revision model and a learning algorithm Position-Exchanging Learning(PEL) which is based on the similarity of agents' strategies in coordination games. By position-exchanging,each agent stands from the viewpoint of its opponent and infers opponents' actions.The belief revision model combines the objective observed actions and subjective inferred actions.Coordination is assured by adjusting the belief de- gree to be 0 or 1.The algorithm PEL is tested in simulations that robots coordinate to avoid collision,and the results prove it performs better than present methods.
作者 王云 韩伟
出处 《计算机工程与应用》 CSCD 北大核心 2008年第36期230-233,248,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.70802025) 江苏省教育厅"青蓝工程"项目 江苏省教育厅自然科学指导计划项目(No.07KJD520070)~~
关键词 多智能体系统 强化学习 协调博弈 Multi-Agents System(MAS) reinforcement learning coordination games
  • 相关文献

参考文献13

  • 1Parker L E.Heterogeneous multi-robot cooperation[D].MIT,USA, 1994.
  • 2朱庆保.全局未知环境下多机器人运动蚂蚁导航算法[J].软件学报,2006,17(9):1890-1898. 被引量:33
  • 3Stilwell D J,Bay J S.Toward the development of a material transport system using swarms of ant-like robots[C]//Proceedings of the IEEE International Conference on Robotics and Automation,Atlanta, 1993 : 766-771.
  • 4Fujii T,Arai Y,Asama H.Muhilayer reinforcement learning for complicated collision avoidance problems.
  • 5Littman M L.Markov games as a framework for multi-agent reinforcement learning [C]//Proc 11th Internati onal Conference on Machine Learning.N J, Morgan Kaufmann: San Mateo, 1994 : 157-163.
  • 6刘海涛,洪炳熔,朴松昊,王雪梅.不确定性环境下基于进化算法的强化学习[J].电子学报,2006,34(7):1356-1360. 被引量:12
  • 7Bowling M,Veloso M.Rational and convergent learning in stochastic games[C]//Veloso M.Proceedings of International Joint Conference of Artificial Intelligence, Seattle, WA, 2001 : 1021-1026.
  • 8Bowling M,Veloso M.Multiagent learning using a variable learning rate[J].Artificial Intelligence,2002,136:215-250.
  • 9Fudenberg D,Levine D K.The theory of learning in games [M]. Cambridge, MA: MIT Press, 1999.
  • 10韩伟,陈优广,姜昌华.基于内省推理的多agent在线学习方法[J].模式识别与人工智能,2007,20(2):254-260. 被引量:5

二级参考文献26

  • 1Littman M L,Cassandra A,Kaelbling L.Learning policies for partially observable environments:Scaling up[A].Proceedings of the Twelfth International Conference on Machine Learning[C].San Francisco,CA:Morgan Kaufmann Publishers,1995.362-370.
  • 2Pineau J,Gordon G,Thrun S.Point-based value iteration:an anytime algorithm for POMDPs[A].Proceedings of International Joint Conference on Artificial Intelligence[C].Acapulco,Mexico:AAAI,2003.1025-1032.
  • 3Guo M,et al.A new Q-learning algorithm based on the metropolis criterion[J].IEEE Trans.on Systems,Man and Cybernetics,2004,34(5):2140-2143.
  • 4Moscato P.Memetic Algorithms:A Short Introduction New Ideas in Optimization[M].London,UK:McGraw-Hill,1999
  • 5Vapnik V N.Statistical Learning Theory[M].New York:Wiley-Inter Science,1998.
  • 6Frank H F.Tuning of the structure and parameters of a neural network using an improved genetic algorithm[J].IEEE Trans.on Neural Network,2003,14 (1):79-88.
  • 7Burago D,et al.On the complexity of partially observed Markov decision processes[J].Theoretical Computer Science,1996,157(2):161-183.
  • 8Kaelbling L P,Littman M L,and Cassandra A R.Planning and acting in partially observable stochastic domains[J].Artificial Intelligence,1998,101:99-134.
  • 9Zhang N L,Zhang W.Speeding up the convergence of value iteration in partially observable Markov decision processes[J].Journal of AI Research,2001,14:29 -51.
  • 10Littman M L. Markov Games as a Framework for Multi-Agent Reinforcement Learning// Cohen W W, Hirsh H, eds. Proc of the 11th International Conference on Machine Learning. New Brunswick, USA, 1994:157-163

共引文献46

同被引文献12

引证文献2

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部