一般和对策中基于协商的多代理强化学习

Multiagent Reinforcement Learning Based on Negotiation in General-Sum Games

下载PDF

导出

摘要一般和对策中,只考虑个体理性的多代理协作是一种无全局目标的协作.代理学习基于对手策略假设,不能保证假设的正确性.为此通过定义代理协作的集体目标,提出了一种基于多代理协商的代理强化学习算法.代理选择协商策略,并惩罚偏离该策略的代理来保证协商策略的执行.文中给出了学习收敛的条件及证明,并以实例加以分析. In general-sum games, multiagent cooperation has no global objective, and only individual rationality is concerned. Agent s learning is based on the assumption of opponents policies, and this assumption may be wrong. By defining the global objective of agents, a novel multiagent reinforcement learning algorithm was proposed. All agents selected negotiated policies during learning, and punished those agents deviating from negotiated policies to ensure the execution of these policies. It was proved that the ...

作者张化祥赵彤黄上腾

机构地区山东师范大学信息管理学院青岛科技大学自动化与电子工程学院上海交通大学计算机科学与工程系

出处《上海交通大学学报》 EI CAS CSCD 北大核心 2005年第S1期108-112,共5页 Journal of Shanghai Jiaotong University

关键词 MARKOV对策强化学习多代理协作协商 Markov games reinforcement learning multiagent coordination negotiation

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献10

1Watkins C J C H,Dayan P,Q-learning. Machine Learning . 1992
2Littman M L.Markov games as a framework for multi -agent reinforcement learning[].In th ICML.1994
3Hu J,Wellman M P.Nash Q-learning for general sum stochastic games[].Journal of Machine Learning Research.2003
4Kaelbling L,Littman M L,Moore A W.Reinforcement learning: A survey[].Journal of Artificial Organs.1996
5Boutilier C.Sequential optimality and coordination in multiagent systems[].th IJCAI.1999
6Bowling M,Veloso M.Variable learning rate and the convergence of gradient dynamics[].Proc of th ICML.2001
7BOWLING M,VELOSO M.Multiagent learning using a variable learning rate[].Artificial Intelligence.2002
8Szepesvari C,Littman M L.A unified analysis of value-function-based reinforcement learning algorithms[].Neural Computation.1999
9ZHANG Hua-xiang,ZHANG Liang,HUANG Shang-teng,et al.A machine learning approach to automated negotiation[].High Technology.2004
10SuttonRS,BartoA.Reinforcementlearning:Anin-troduction[]..1998

1张化祥,黄上腾.多代理最优响应Q学习及收敛性证明[J].计算机科学,2004,31(4):96-98. 被引量：1
2董相均,史浩山,赵永辉,姜飞.可复用的基于移动代理的网管系统模型[J].计算机应用研究,2008,25(3):876-877.
3孙文,陈高琳.多代理协作的网络管理在无线传感器网络中的应用研究[J].黄石理工学院学报,2010,26(1):16-19. 被引量：1
4吴开兴,郑凤,王立功.基于委托代理的多代理协商机制的研究[J].中国科教博览,2004(8):67-70.
5任竞颖.基于多代理协作的无线传感器网络入侵检测系统[J].煤炭技术,2010,29(6):177-179.
6蒋凌云,王汝传.基于规则的多移动代理协作的网络故障管理系统[J].光电技术应用,2004,19(5):54-57. 被引量：1
7李镇宇,陈小平.基于Markov对策的强化学习及其在RoboCup中的应用[J].计算机工程与应用,2005,41(27):202-204.
8武俊,朱继华.Mobile Agent协作运行资源的信息融合分析[J].重庆邮电大学学报（自然科学版）,2007,19(6):738-740.
9朱庆保,陈蓁.提高小脑模型神经网络精度的算法及仿真应用[J].软件学报,2000,11(1):133-137. 被引量：2
10孟凡强,许克明.结合专家知识改进BP网络在故障诊断中的应用[J].贵州工业大学学报（自然科学版）,2003,32(5):49-53. 被引量：1

上海交通大学学报

2005年第S1期

浏览历史

内容加载中请稍等...

一般和对策中基于协商的多代理强化学习

参考文献10

相关作者

相关机构

相关主题

浏览历史