一种基于Q学习的分布式多任务流调度算法被引量：1

Distributed Scheduling Algorithm for Multiple Task Flows Based on Q-learning

下载PDF

导出

摘要近来实时动态任务分配机制得到越来越多的研究.考虑多任务流并存时的任务分配问题,提出基于Q学习的分布式多任务流调度算法,不仅能适应自身任务流的到达过程,还充分兼顾其他任务流的到达及分配的影响,从而使得整个系统长期期望回报最大.分布式特性使得算法适用于开放的,局部可见的多Agent系统;强化学习的采用使得任务分配决策自适应系统环境隐藏的不确定性.实验表明此算法具有较高的任务吞吐量和任务完成效率. Recently real-time dynamic task allocation mechanisms draw more attention.Allocation of multiple task flows are considered in this paper,and a distributed scheduling algorithm based on Q-learning for multiple task flows is proposed.This algorithm can not only adapt to task flow on itself,but also take arrival and allocation of other task flows into account,thereby maximizing long-term expected reward of the whole system.Distributed property renders it applicable to open multi-agent systems with local visibility,while reinforcement learning makes allocation decisions adaptable to environment uncertainty.Experiments establish that the algorithm has higher task throughput,improving system efficiency.

作者肖正马胜祥张世永

机构地区复旦大学计算机科学技术学院

出处《小型微型计算机系统》 CSCD 北大核心 2010年第4期597-602,共6页 Journal of Chinese Computer Systems

关键词 Agent合作任务分配多任务流 Q学习 agent cooperation task allocation multiple task flows Q-learning

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1EI-Rewini H,Lewis T G.Scheduling parallel program tasks onto arbitrary target machines[J].Journal of Parallel and Distributed Computing,1990,9(1):138-153.
2Hesham Ali,Hesham EI-Rewini.Task allocation in distributed systems[J].Journal of Combinatorial Mathematics and Combinatorial Computing,1993,14(1):15-32.
3Yoav Shoham,Rob Powers,Troud Grenager.Multi-agent reinforcement learning:a critical survey[R].Technical Report,Stanford University,2003.
4仲宇,顾国昌,张汝波.多智能体系统中的分布式强化学习研究现状[J].控制理论与应用,2003,20(3):317-322. 被引量：12
5Hosam Hanna,Abdel-lllah Mouaddib.Task selection problem under uncertainty as decision-making[C].In:Proc.of International Conference on Autonomous Agent and Multi-Agent System (AAMAS),2002,1303-1308.
6Abdallah Sherief,Lesser Victor.Modeling task allocation using a decision theoretic model[A].In:Proc.of Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems[C],ACM Press,2005,719-726.
7Abdallah Sherief,Lesser Victor.Learning task allocation via multi-level policy gradient algorithm with dynamic learning rate[C].In:Proceedings of Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains,the International Joint Conference on Artificial Intelligence,IJCAI,2005,76-82.
8Abdallah Sherief,Lesser Victor.Learning the task allocation game[A].In:Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems[C],ACM Press,2006,850-857.
9Mailler Roger,Lesser Victor.A cooperative mediation based protocol for dynamic.distributed resource allocation[M].IEEE Transaction on Systems,Man,and Cybernetics,Part C,Special Issue on Game theoretic Analysis and Stochastic Simulation of Negotiation Agents,IEEE Press,2006,36(1):80-91.
10Krainin Michael,An Bo,Lesser Victor.An application of automated negotiation to distributed task allocation[A].IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT 2007)[C],IEEE Computer Society Press,2007,138-145.

二级参考文献57

1SUTTON R. Learning to predict by the methods of temporal difference [J]. Machine Learning, 1988,3( 1 ) :9 - 44.
2RIBEIRO C. Embedding a priori knowledge in reinforcement learning [ J]. J of Intelligent and Robotic Systems, 1998,21 ( 1 ) :51 - 71.
3OH C, NAKASHIMA T, ISHIBUCHI H. Initialization of Q -values by fuzzy rules for accelerating Q -learning [A]. Proc of IEEE Int Conf on Neural Networks [ C ]. Piscataway, NJ: IEEE Press,1998:2051 - 2056.
4ISHIBUCHI H, NAKASHIMA T, MIYAMOTO H. Fuzzy Q-learning for a multi-player non-cooperative repeated game[ A]. Proc of IEEE Int Conf on Fuzzy Systems [ C]. Piscataway,NJ: IEEE Press, 1997:1573 - 1579.
5SUN R, PETERSON T. Multiagent reinforcement learning: weighting and partitioning [J]. Neural Networks, 1999, 12(4) :727 - 753.
6TAKAHASHI Y, ASADA M, HOSODA K. Reasonable performance in less learning time by real robot based on incremental state space segmentation [ A ]. Proc of IEEE/ RSJ Int Conf on Intelligent Robots and Systems [C]. Piscataway, NJ:IEEE Press, 1996:1518.
7HOUGEN D F, GINI M, SLAGLE J. Partitioning input space for reinforcement learning for control [ A ]. Proc of IEEE Int Conf on Neural Networks [C]. Piscataway, NJ: IEEE Press, 1997:755-760.
8FINTON D J, HU Y. An application of importance-based feature extraction in reinforcement learning [ A ]. Proc of the 4th IEEE Workshop on Neural Networks for Signal Processing [ C]. Piscataway,NJ:IEEE Press, 1994:52 - 60.
9ASADA M, NODA S, HOSODA K. Action-based sensor space segmentation for soccer robot learning [J]. Applied Artificial Intelligente, 1998, 12(2/3) : 149 - 164.
10SUN R, PETERSON T. Partitioning in reinforcement learning[ A]. Proc of Int Joint Conf on Neural Networks [ C]. Piscataway,NJ:IEEE Press, 1999:1133- 1138.

共引文献11

1宋梅萍,顾国昌,张国印.随机博弈框架下的多agent强化学习方法综述[J].控制与决策,2005,20(10):1081-1090. 被引量：13
2黄炳强,曹广益,王占全.强化学习原理、算法及应用[J].河北工业大学学报,2006,35(6):34-38. 被引量：19
3唐亮贵,刘波,唐灿,程代杰.基于神经网络的Agent增强学习模型[J].计算机科学,2007,34(11):156-158. 被引量：3
4刘喜春,王超,王文广,王维平.基于多Agent强化学习的战时备件供应保障动态协调机制[J].空军工程大学学报（自然科学版）,2009,10(3):59-63. 被引量：2
5余涛,周斌,甄卫国.强化学习理论在电力系统中的应用及展望[J].电力系统保护与控制,2009,37(14):122-128. 被引量：29
6孟伟,韩学东.并行强化学习算法及其应用研究[J].计算机工程与应用,2009,45(34):25-28. 被引量：7
7陈玉明,张广明,赵英凯.基于强化学习的混合智能控制算法研究与分析[J].机床与液压,2010,38(20):75-77.
8余涛,刘靖,胡细兵.基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J].电工技术学报,2012,27(4):185-192. 被引量：11
9尚艳玲,肖文雅.多Agent系统的Q值强化学习算法[J].河南师范大学学报（自然科学版）,2013,41(2):158-160. 被引量：2
10徐诚,殷楠,段世红,何昊,王然.基于奖励滤波信用分配的多智能体深度强化学习算法[J].计算机学报,2022,45(11):2306-2320. 被引量：3

同被引文献7

1尹翔,蒋建国,夏娜,常传文.多任务多联盟并行生成:模型与求解[J].系统工程理论与实践,2008,28(4):90-95. 被引量：9
2于程远,万剑怡,陈艳琼.基于模式的并行编程环境中任务队列模式的研究与实现[J].计算机与现代化,2008(10):104-109. 被引量：1
3范瑞娟,黄斌,刘新友.基于多核CPU的并行程序在指控系统中的应用[J].微型电脑应用,2008,24(12):48-49. 被引量：3
4胡斌,袁道华.TBB多核编程及其混合编程模型的研究[J].计算机技术与发展,2009,19(2):98-101. 被引量：17
5徐娟,韩江洪,张利,张建军,刘光年.知识网格环境下多任务模型协同工作机制研究[J].合肥工业大学学报（自然科学版）,2009,32(12):1823-1826. 被引量：3
6李妮,陈铮,龚光红,彭晓源.多核并行计算技术在景象匹配仿真中的应用[J].系统工程与电子技术,2010,32(2):428-432. 被引量：9
7王朋宇,陈云霁,沈海华,陈天石,张珩.片上多核处理器存储一致性验证[J].软件学报,2010,21(4):863-874. 被引量：13

引证文献1

1刘必广.基于多任务的并行程序设计方法[J].计算机与数字工程,2010,38(12):157-158. 被引量：2

二级引证文献2

1王丽一,郑岩,李岱峰,王俊.众核阵列非满配时的并行编程方法[J].计算机应用与软件,2012,29(10):123-127.
2李洪普,刘卫东,徐娜.基于Windows XPE的水下航行器航行操纵系统多任务软件设计[J].计算机测量与控制,2015,23(5):1806-1809. 被引量：1

1韩泉叶,张志斌.基于CBR和MAS故障诊断系统的合作评价策略[J].甘肃科学学报,2006,18(4):60-63. 被引量：1
2黄能超,何家峰.基于UM-PRS的多Agent解释器设计[J].信息通信,2016,29(5):116-118.
3李玉,党德玉.一种改进的基于多Agent协作的任务分解算法[J].东北电力大学学报,2008,28(4):48-51. 被引量：4
4胡旦华,马永光,张宇晴.多Agent系统中合作策略的研究[J].计算机仿真,2004,21(3):130-132. 被引量：5
5柯有敏,胡山立.一种并发的双子集语义Agent模型[J].计算机研究与发展,2006,43(z1):23-27.
6郑延斌,郭凌云,刘晶晶.多智能体系统分散式通信决策研究[J].计算机应用,2012,32(10):2875-2878. 被引量：3
7邓寒冰,张霞,刘积仁.效用驱动的多agent合作机制[J].通信学报,2013,34(7):124-133. 被引量：4
8傅强,郑纬民.一种适用于机群系统的任务动态调度方法[J].软件学报,1999,10(1):19-23. 被引量：23
9汪国安,杨焕.基于负载均衡的云计算任务调度算法的研究[J].福建电脑,2012,28(12):8-10. 被引量：2
10佘博,葛富斌.基于遗传算法的舰艇编队任务分配问题研究[J].舰船电子工程,2013,33(11):32-33.

小型微型计算机系统

2010年第4期

浏览历史

内容加载中请稍等...

一种基于Q学习的分布式多任务流调度算法被引量：1

参考文献12

二级参考文献57

共引文献11

同被引文献7

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于Q学习的分布式多任务流调度算法 被引量：1

参考文献12

二级参考文献57

共引文献11

同被引文献7

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于Q学习的分布式多任务流调度算法被引量：1