动态联盟收益值的再励学习被引量：1

Reinforcement Learning for the Value of Dynamic Coalition

下载PDF

导出

摘要联盟形成的收益值是模糊和不确定的,难于计算,而联盟收益值在成员变化的情况下的计算就更为复杂。Lerman等人实现了动态联盟Agent进出联盟的管理方法,Chalkiadakis则研究了不确定情况下联盟的再励学习,但没有涉及联盟成员变化情况下的收益值动态性。论文定义了带折扣率的估计核,给出一种再励学习算法来计算联盟成员变化后的收益值,深化了Chalkiadakis的工作。实验结果验证了该方法的可行性和正确性。 It is difficult to compute the value of dynamic coalition because of its fuzzy and uncertain character.It is even more difficult to compute the value while the number of coalition member changes,Lerman implements the management methods for agents joining and leaving coalition.Chalkiadakis investigates Bayesian reinforcement learning for coalition formation under uncertainty,but he has not investigated the value of dynamic coalition with the change of dynamic coalition membership.In this paper an estimate core using discount factor is defined.A reinforcement learning method is proposed to compute the value of dynamic coalition.It improves the work of Chalkiadakis.The experiment result demonstrates that it is feasible and correct.

作者童向荣张伟

机构地区烟台大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2006年第6期85-87,共3页 Computer Engineering and Applications

基金国家自然科学基金重大资助项目(编号:60496323) 山东省教育厅科技计划资助项目(编号:JSJ03J1)

关键词多AGENT系统动态联盟形成再励学习 multi-agent system,dynamic coalition formation,reinforcement learning

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1Sandholm T,Larson K,anderson M et al.coalition Structure Genera tion with Worst Case Guarantees[J].Artificial Intelligence,1999;111(1-2):209～238
2Shehory O,Kraus S.Methods for task allocation via Agent coalition formation[J].Artificial Intelligence,1998;101 (1-2):165～200
3Mares M.Fuzzy cooperative games-Cooperation with vague expectations[M].Physical Verlag,2001
4Sarit Kraus,Onn Shehory,Gilad Taase.Coalition Formation with Uncertain Heterogeneous Information[C].In:AAMAS'03,Melboume,Australia,2003:14～18
5K Lerman,O Shehory.Coalition Formation for Large Scale Electronic Markets[C].In:Proceedings of the Fourth International conference on MultiAgent Systems ICMAS'2000,Boston,2000:216～222
6Georgios Chalkiadakis,Craig Boutilier.Bayesian Reinforcement Learning for Coalition Formation under Uncertainty[C].In:AAMAS'04,New York City,USA,2004:19～23
7Leslie Pack Kaelbling,Michael L Littman,Anthony R Cassandra.Planning and acting in partially observable stochastic domains[J].Artificial Intelligence,1998;101:99～134
8Michael Bowling,Manuela Veloso.Multiagent Learning Using a Variable Learning Rate[J].Artificial Intelligence,2002;136:215～250
9Michael Bowling,Manuela Veloso.Simultaneous Adversarial MultiRobot Learning[C].In:IJCAI-03,2003:699～704

同被引文献4

1刘惊雷,童向荣,张伟.一种快速构建最优联盟结构的方法[J].计算机工程与应用,2006,42(4):35-37. 被引量：10
2张新良,石纯一.多Agent联盟结构动态生成算法[J].软件学报,2007,18(3):574-581. 被引量：25
3胡山立,石纯一.一种任一时间联盟结构生成算法[J].软件学报,2001,12(5):729-734. 被引量：33
4胡山立,石纯一.给定限界要求的联盟结构生成[J].计算机学报,2001,24(11):1185-1190. 被引量：18

引证文献1

1任子仪,童向荣.约束条件下联盟生成研究进展[J].智能系统学报,2019,14(3):413-422. 被引量：1

二级引证文献1

1朱丽华,龙海侠.基于图约束联盟形成的社会共享乘车问题[J].计算机工程与设计,2021,42(4):1089-1095.

1何波,杨武,黄贤英,张建勋.基于XML的个性化Web内容挖掘研究[J].计算机工程与应用,2006,42(4):168-170.
2谢力,魏汝祥.收益值法在舰船装备维修过程控制中的应用[J].中国修船,2005,18(3):34-37.
3唐勇,陈宝峰,张大鹏,陈琛.基于Agent的机器人足球赛中的再励学习算法[J].燕山大学学报,2005,29(4):324-327.
4刘建辉,王君,冀常鹏.P2P中基于文件复制抑制搭便车行为的研究[J].计算机工程,2013,39(10):301-304.
5王鲜,李金宝,张德升,任倩倩.无线传感器网络中基于功率控制的链路调度算法研究[J].计算机研究与发展,2010,47(S2):219-223.
6孙凌宇,冷明,朱平,李金忠.云计算环境下基于禁忌搜索的负载均衡任务调度优化算法[J].小型微型计算机系统,2015,36(9):1948-1952. 被引量：23
7冷明,孙凌宇,朱平.云计算负载均衡任务调度问题的元胞自动机模型研究[J].小型微型计算机系统,2016,37(10):2212-2216. 被引量：5
8孙凌宇,冷明,朱平.一种基于贪心策略的启发式云计算任务调度算法[J].井冈山大学学报（自然科学版）,2015,36(6):56-61. 被引量：1
9马勇,许晓鸣,张卫东.一种改进的基于再励学习算法的模糊神经BOXES控制系统[J].模糊系统与数学,2000,14(1):78-83. 被引量：2
10史玉良,王捷.一种多租户云数据存储缓存管理机制[J].计算机研究与发展,2014,51(11):2528-2537. 被引量：7

计算机工程与应用

2006年第6期

浏览历史

内容加载中请稍等...

动态联盟收益值的再励学习被引量：1

参考文献9

同被引文献4

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

动态联盟收益值的再励学习 被引量：1

参考文献9

同被引文献4

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

动态联盟收益值的再励学习被引量：1