基于强化学习的多Agent协作研究被引量：5

Cooperative Multi-agent Systems Based on Reinforcement Learning

下载PDF

导出

摘要强化学习为多 Agent之间的协作提供了鲁棒的学习方法 .本文首先介绍了强化学习的原理和组成要素 ,其次描述了多 Agent马尔可夫决策过程 MMDP,并给出了 Agent强化学习模型 .在此基础上 ,对多 Agent协作过程中存在的两种强化学习方式 :IL(独立学习 )和 JAL(联合动作学习 )进行了比较 .最后分析了在有多个最优策略存在的情况下 ,协作多 Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in fully cooperative multi agent systems (MAS). This paper first introduces the basic principles and components of reinforcement learning, then describes multi agent extension MMDP and presents reinforcement learning model of agents in cooperative MAS. After that we distinguish reinforcement learners that ignore the presence of other agents from those that explicitly attempt to learn the value of joint actions and strategies of their counterparts. In the last, some simple and commonly used coordination mechanisms are examined.

作者郑淑丽韩江洪骆祥峰蒋建文

机构地区合肥工业大学计算机与信息学院

出处《小型微型计算机系统》 CSCD 北大核心 2003年第11期1986-1988,共3页 Journal of Chinese Computer Systems

基金安徽省自然科学基金 ( 0 0 0 43 115 )资助

关键词多AGENT系统强化学习 MMDP 协调机制 multi agent system reinforcement learning multi agent MDP coordination mechanisms

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1骆正虎,杨敬安,骆祥峰,郑淑丽,张浩.基于移动Agent的分布式计算模型研究[J].小型微型计算机系统,2002,23(3):300-304. 被引量：27
2Richard S. Sutton & Andrew G. Reinforcement learning: an introduction[M]. MIT Press, Cambridge, MA. 1998 A.
3Kaelhling L P, Littman M &Moore A. Reinforcement learning: a survey[J]. Journal of Artificial Intelligence Research. 1994.(4): 237-285.
4Buffet O, Dutech A and Charpillet F. Incremental reinforcemen learning for designing multi agents systems[C]. In: Proceedings of the Fifth International Conference on Autonomous Agents (Agems'01). Montreal 2001.
5Claus C. Boutilier C. The dynamics of reinforcemem learning to cooperalive muhi-agem systems[C]. In: Proceedings oi the Fifteenth National Conference on Artificial Intelligence. 1998. 746-752.
6Boutilier C. Sequential optimality and coordination in multi-agent systems[C]. In: Proceedings of the Sixteenth international Joint Conferences on Artificia! Intelligence (IJCAI-99). july 1999.
7Littman M L. Szepesvaric. A generalized reinforcement-learning model: convergence and applications[C]. In: Saitta Led. Proc of the i3th Int'l on Machine Learning. Earl Italy: Morgan Kanfmann. 1996.310-318 .
8Singh S. Jaakkola T & Jordan M. Learning without stateesumation in partially observable markovian decision processes[C]. In: Proceeding o{ the Eleventh International Conference on Machine Learning. 1994.

二级参考文献10

1史忠植.智能主体及其应用[M].北京：科学出版社,2001..
2David Wong, Noemi Paciorek, Dana Moore. Java-based mobile agents[J]. Communications of the ACM, March 1999.42(3): 92～102.
3Chess D., Harrison C., Kershenbaum A. Mobile agents: are they a good idea[C] In Proceedings of the Second International Workshop on Mobile Object Systems, Linz, July 1996
4Todel Sundsted.An introduction to agents. Technical Report.Available at:http://www javaworld com/Javaworld /jw-06-1998/jw-06-howto html.1998
5White J. Telescript technology: mobile agents[M]. In Software Agents, J. Bradshaw Ed., MIT Press. 1996
6Strasser M., Baumann J., Houl F. Mole-a Java based mobile agent system[C]. In Proceedings of the Second International Workshop on Mobile Object Systems, Linz, July 1996
7Robert Gray. Agent Tcl: a transportable agent system[C]. In Proceedings of the Fourth International Conference on Information and Knowledge Management (CIKM95), Baltimore, Maryland, Dec.1995
8Danny B. Lange. Mitsuru Oshima. Programming and deploying mobile agents with Java[M]. Addison-Wesley, Reading, M.A. 1998
9Joseph Kiniry, Daniel Zimmerman. A hands-on look at Java mobile agents[J]. IEEE Internet Computing, 1997, 7, 21～30
10张浩,骆正虎,杨敬安.基于Java语言的移动Agent开发平台[J].合肥工业大学学报（自然科学版）,2001,24(5):907-912. 被引量：15

共引文献26

1李荣鑫.嵌入式移动数据库与Agent技术[J].单片机与嵌入式系统应用,2004(12):18-20. 被引量：5
2方志祥,李清泉.基于Mobile Agent技术的空间信息移动服务[J].测绘学报,2004,33(4):328-334. 被引量：11
3丁邦旭,陈海,王炜立,邬昌兴.一种智能分布式计算模型[J].科技广场,2005(1):7-10.
4车骏,顾绍元.基于多Agent的特征提取模型研究[J].计算机应用研究,2005,22(5):40-41. 被引量：1
5马兰士PM6010 OSE合并式功放[J].视听技术,2006(1):61-61.
6胡晓冬,王锦华.主井绞车测速装置的技术改造[J].矿山机械,2006,34(5):152-153.
7徐全生,崔向明,王淑德.基于移动Agent的分布式数据访问模型[J].沈阳工业大学学报,2006,28(3):311-314.
8杨进中.基于移动Agent的远程网络自适应测试系统模型[J].青岛大学学报（自然科学版）,2006,19(2):50-53. 被引量：1
9范蓓蓓,汪厚祥,李军.分布式计算中的搜索树技术研究[J].计算机工程与设计,2007,28(4):869-871.
10田文武.基于SQL Server Mobile移动数据库的应用[J].呼伦贝尔学院学报,2007,15(2):74-76. 被引量：4

同被引文献38

1王进发,李励,李仕明.军事供应链的结构柔化[J].军事运筹与系统工程,2005,19(1):23-28. 被引量：9
2夏莉,黄晶晶.期权定价理论与分阶段投资决策[J].商业研究,2004(16):113-114. 被引量：6
3周浦城,洪炳镕,黄庆成.一种新颖的多agent强化学习方法[J].电子学报,2006,34(8):1488-1491. 被引量：8
4黄炳强,曹广益,王占全.强化学习原理、算法及应用[J].河北工业大学学报,2006,35(6):34-38. 被引量：19
5Alfredsson P.Flexible Supply:The Next Step in the Evolution of Sparing Strategies[C]//SOLE 2000 35th Annual Proceedings,[S.l]:SOLE,2000.
6Lawson E,Ferris T,Cropley D,et al.Development of A Foundation for Military Network Science[R/OL].[2009-4-2].http://arrow.unisa.edu.au:8081/1959.8/47987.
7Kshanti Greene,David Cooper G,Michael Czajkowski,et al.A Cognitive Agent Architecture Optimized for Adaptivity[C]//DAMAS LNAI3890.Heidelberg:Spring Berlin,2006:104-120.
8Gutknecht J O,Michel F.From Agents to Organizations:An Organizational View of Multi-agent Systems[C]//AOSE Australia:AasE Melbourne,2003:214-230.
9Sutton R S,Barto A G..Reinforcement Learning[M].MA:MIT Press,1997.
10Tan Ming.Multi-agent Reinforcement Learning:Independent vs Cooperative Agent[C]// In Proceedings of the 10th International Conference on Machine Learning (ICML-93),San Fransisco:Morgan Kaufmann Publisher Inc,1993:487-494.

引证文献5

1党兴华,权小锋,尹洪英.强化学习算法在分阶段组合投资决策中的应用[J].科技管理研究,2006,26(3):241-243. 被引量：1
2刘喜春,王超,王文广,王维平.基于多Agent强化学习的战时备件供应保障动态协调机制[J].空军工程大学学报（自然科学版）,2009,10(3):59-63. 被引量：2
3张媛,张广明,袁宇浩.利用聚类分析法改进的多Agent协作强化学习方法[J].计算机测量与控制,2010,18(4):923-926. 被引量：1
4韦庆丹,陈焕文,陈鹏慧,蔡琼.强化学习在机器人足球半场进攻中的应用[J].微计算机信息,2011,27(12):104-105. 被引量：1
5郭广明.基于Agent的智能家居系统研究与设计研究[J].信息系统工程,2015,28(8):37-38. 被引量：1

二级引证文献6

1张蓓佳,侯合银.技术创新影响下的分阶段组合投资规模决策模型研究[J].科技管理研究,2011,31(23):56-59. 被引量：2
2李学俊,陈士洋.RoboCup仿真2D实验平台[J].实验室研究与探索,2014,33(4):58-61. 被引量：3
3杨小小,綦辉,陈磊.基于改进ADC法的潜艇反潜作战方案效能评估[J].海军航空工程学院学报,2014,29(3):285-290. 被引量：4
4张学强.基于Agent的智能家居系统探讨[J].低碳世界,2019,9(3):146-147. 被引量：1
5张杰,王刚,姚小强,宋亚飞,郑康波.双向RNN下的航迹拟合模型研究[J].计算机科学,2019,46(S11):58-61. 被引量：1
6宋泠澳,刘涛,赵冬梅,董宏扬.强化学习在军事上的应用[J].火力与指挥控制,2023,48(12):8-16.

1李月娟,吕永健,常迁臻,朱李云.基于MMDP的无人作战飞机任务分配模型研究[J].计算机应用与软件,2013,30(7):276-279. 被引量：4
2刘明华,陆立强.采用D/MMDP/1/K排队模型估计GPRS中PCU桶的容量[J].复旦学报（自然科学版）,2005,44(3):363-369.
3张婷.项目教学法在计算机教学中的运用[J].新课程（下）,2011,0(6):104-104.
4肖正,张世永.基于后悔值的多Agent冲突博弈强化学习模型[J].软件学报,2008,19(11):2957-2967. 被引量：6
5谷歌取得人工智能新突破[J].时事资料手册,2015(2):95-95.
6薛小平,郭光胜,魏高山,阮永良.上海站ATM网络性能分析[J].上海铁道大学学报,1998,19(8):73-77.
7李佳丽.电化教学的初步探索[J].中小学电教（下）,2012(10):99-99.
8王瑞.在计算机教学中培养学生的创新能力[J].辽宁教育行政学院学报,2005,22(4):137-137. 被引量：1
9段丹青,杨卫平,谭敏.“兴趣·探究·互动·创新”教学模式研究[J].中国科技信息,2008(12):216-217. 被引量：5
10孙魁,吴成东.强化学习模型及其在避障中的应用[J].山东工业技术,2016(1):261-263.

小型微型计算机系统

2003年第11期

浏览历史

内容加载中请稍等...

基于强化学习的多Agent协作研究被引量：5

参考文献8

二级参考文献10

共引文献26

同被引文献38

引证文献5

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于强化学习的多Agent协作研究 被引量：5

参考文献8

二级参考文献10

共引文献26

同被引文献38

引证文献5

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于强化学习的多Agent协作研究被引量：5