摘要
一般和对策中,只考虑个体理性的多代理协作是一种无全局目标的协作.代理学习基于对手策略假设,不能保证假设的正确性.为此通过定义代理协作的集体目标,提出了一种基于多代理协商的代理强化学习算法.代理选择协商策略,并惩罚偏离该策略的代理来保证协商策略的执行.文中给出了学习收敛的条件及证明,并以实例加以分析.
In general-sum games, multiagent cooperation has no global objective, and only individual rationality is concerned. Agent s learning is based on the assumption of opponents policies, and this assumption may be wrong. By defining the global objective of agents, a novel multiagent reinforcement learning algorithm was proposed. All agents selected negotiated policies during learning, and punished those agents deviating from negotiated policies to ensure the execution of these policies. It was proved that the ...
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2005年第S1期108-112,共5页
Journal of Shanghai Jiaotong University
关键词
MARKOV对策
强化学习
多代理协作
协商
Markov games
reinforcement learning
multiagent coordination
negotiation