In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent doma...In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent domains, convergence of the Q values is guaranteed only when an agent behaves optimally and its opponents’ strategies satisfy certain conditions, and an agent can get best learning performances when it adopts the same learning algorithm as that of its opponents.展开更多
文摘In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent domains, convergence of the Q values is guaranteed only when an agent behaves optimally and its opponents’ strategies satisfy certain conditions, and an agent can get best learning performances when it adopts the same learning algorithm as that of its opponents.