期刊文献+

基于后悔值的多Agent冲突博弈强化学习模型 被引量:6

Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games
下载PDF
导出
摘要 对于冲突博弈,研究了一种理性保守的行为选择方法,即最小化最坏情况下Agent的后悔值.在该方法下,Agent当前的行为策略在未来可能造成的损失最小,并且在没有任何其他Agent信息的条件下,能够得到Nash均衡混合策略.基于后悔值提出了多Agent复杂环境下冲突博弈的强化学习模型以及算法实现.该模型中通过引入交叉熵距离建立信念更新过程,进一步优化了冲突博弈时的行为选择策略.基于Markov重复博弈模型验证了算法的收敛性,分析了信念与最优策略的关系.此外,与MMDP(multi-agent markov decision process)下Q学习扩展算法相比,该算法在很大程度上减少了冲突发生的次数,增强了Agent行为的协调性,并且提高了系统的性能,有利于维持系统的稳定. For conflict game, a rational but conservative action selection method is investigated, namely, minimizing regret function in the worst case. By this method the loss incurred possibly in future is the lowest under this very policy, and Nash equilibrium mixed policy is obtained without information about other agents. Based on regret, a reinforcement learning model and its algorithm for conflict game under multi-agent complex environment are put forward. This model also builds agents' belief updating process on the concept of cross entropy distance, which further optimizes action selection policy for conflict games. Based on Markov repeated game model, this paper demonstrates the convergence property of this algorithm, and analyzes the relationship between belief and optimal policy. Additionally, compared with extended Q-learning algorithm under MMDP (multi-agent markov decision process), the proposed algorithm decreases the number of conflicts dramatically, enhances coordination among agents, improves system performance, and helps to maintain system stability.
作者 肖正 张世永
出处 《软件学报》 EI CSCD 北大核心 2008年第11期2957-2967,共11页 Journal of Software
关键词 MARKOV对策 强化学习 冲突博弈 冲突消解 Markov game reinforcement learning conflict game conflict resolving
  • 相关文献

参考文献2

二级参考文献25

  • 1Parr K. Policy iteration for factored MDPs. In: Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence (UAI00). Stanford,2000. 326-334. http://ai.stanford.edu/~koller/papers/uai00kp.html
  • 2Parr K. Computing factored value functions for policies in structured MDPs. In: Int'l Joint Conf. on Artificial Intelligence(IJCAI'99). Morgan Kaufmann Publishers, 1999.1332-1339. http://ai.stanford.edu/~koller/papers/ijcai99kp.html
  • 3de Farias R. Approximate dynamic programming via linear programming. In: Advances in Neural Information Processing Systems14. Cambridge: MIT Press, 2002. http://www.core.org.cn/NR/rdonlyres/Mechanical-Engineering/2-997Spring2004/DF5542A5-BBCC-4BAB-ADBF-41AB0FDA6F95/0/most_uhan_slides.pdf
  • 4Guestrin CE, Venkataraman S, Koller D. Context specific multiagent coordination and planning with factored MDPS. In:AAAI-2002 The 18th National Conf. on Artificial Intelligence. Edmonton, 2002. 253-259. http://www-2.cs.cmu.edu/~shobha/research/aaai02.pdf
  • 5Guestrin CE, Koller D, Parr R. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 2003,19:399-468.
  • 6Guestrin CE, Koller D, Parr R. Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems(NIPS-14). Vancouver, 2001. 1523-1530. http://robotics.stanford.edu/~koller/papers/nips01gkp.html
  • 7Guestrin CE, Koller D, Gearhart C, Kanodia N. Generalizing plans to new environments in relational MDPs. In: Int'l Joint Conf. on Artificial Intelligence (IJCAI 2003). Acapulco, 2003. 1003-1010. http://web.engr. oregonstate.edu/~hamann/generalizing_plans_rmdp.pdf
  • 8Sallans B. Reinforcement learning for factored Markov decision processes [Ph.D. Thesis]. Toronto: University of Toronto, 2002.
  • 9Maes S, Tuyls K, Manderick B. Reinforcement learning in large state spaces: Simulated robotic soccer as a testbed. Lecture Notes in Artificial Intelligence, RoboCup 2002. Fukuoka: Springer-Verlag, 2002. http://como.vub.ac.be:8080/Publications/uploads/1/rlrobo02.ps
  • 10Manderick TM. Q-Learning in simulated robotic soccer: Large state spaces and incomplete information. In: Proc. of the ICMLA2002. Las Vegas, 2002. 226-232. http:∥como.vub.ac.be:8080/Publications/uploads/1/icmla02.ps

共引文献31

同被引文献63

引证文献6

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部