改进Q学习算法在多智能体强化学习中的应用被引量：1

Application of Improved Q-learning Algorithm in Multi-agent Reinforcement Learning

导出

摘要 Q-learning作为一种经典的强化学习算法,其在离散状态下存在计算量高、收敛速度慢等问题。Speedy Q-learning是Q-learning的变种,目的是解决Q-learning算法收敛速度慢问题。为解决多智能体强化学习中“维数灾”问题,在Speedy Q-learning算法的基础上提出了一种基于动作采样的(action sampling based on Speedy Q-learning,ASSQ)算法。该算法采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,将上一迭代步更新后的Q值作为下一状态的最大Q值,有效降低了Q值的比较次数,整体上提升了算法的收敛速度。为减少学习阶段计算量,算法在集中训练阶段求取下一状态最大Q值时,并没有遍历所有联合动作Q值,而只在联合动作空间上进行部分采样。在动作选择和执行阶段,每个智能体又根据学习到的策略独立选择动作,从而有效提高了算法的学习效率。通过在目标运输任务上验证,ASSQ算法能够以100%的成功率学习到最优联合策略,且计算量明显少于Q-learning算法。 As a classical reinforcement learning algorithm,Q-learning has some problems such as high computational load and slow convergence speed in discrete state.Speedy Q-learning is a variant of Q-learning,which aims to solve the problem of slow convergence of Q-learning algorithm.In order to solve the problem of"dimension disaster"in multi-agent reinforcement learning,an action sampling based on Speedy Q-learning(ASSQ)algorithm is proposed.Centralized training with decentralized execution(CTDE)is adopted in this algorithm.The Q-value updated in the last iteration step is taken as the maximum Q-value of the next state,effectively reducing the comparison times of Q-values,which improves the convergence speed of the algorithm on the whole.In order to reduce the amount of computation in the learning stage,the algorithm does not traverse all the joint action Q-values in the centralized training stage,but only carries out partial sampling in the joint action space.In the stage of action selection and execution,each agent chooses actions independently according to the learned strategy,thus effectively improving the learning efficiency of the algorithm.Through the verification on target transportation task,ASSQ algorithm can learn the optimal joint strategy with 100%success rate,and the calculation amount is significantly less than Q-learning algorithm.

作者赵德京马洪聪王家曜周维庆 ZHAO Dejing;MA Hongcong;WANG Jiayao;ZHOU Weiqing(School of Automation,Qingdao University,Qingdao Shandong 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Limited Liability Company,Qingdao Shandong 266043,China)

机构地区青岛大学自动化学院青岛石化检修安装工程有限责任公司

出处《自动化与仪器仪表》 2022年第6期13-16,22,共5页 Automation & Instrumentation

基金青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。

关键词 Q-LEARNING Speedy Q-learning 多智能体强化学习动作采样 Q-learning speedy Q-learning multi-agent reinforcement learning action sampling

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献3

1段勇,崔宝侠,徐心和.多智能体强化学习及其在足球机器人角色分配中的应用[J].控制理论与应用,2009,26(4):371-376. 被引量：26
2宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623-1628. 被引量：24
3郑帅,罗飞,顾春华,丁炜超,卢海峰.基于双估计器的改进Speedy Q-learning算法[J].计算机科学,2020,47(7):179-185. 被引量：5

二级参考文献22

1赵红,李雅菊,宋涛.基于贝叶斯网络的工程项目风险管理[J].沈阳工业大学学报（社会科学版）,2008,1(3):239-244. 被引量：25
2李晓毅,徐兆棣.增量式贝叶斯分类的原理和算法[J].沈阳工业大学学报,2006,28(4):422-425. 被引量：7
3KIM J H, VADAKEPAT E Multi-agent systems: a survey from the robot-soccer perspective[J]. International Journal of Intelligent Automation and Soft Computing, 2000, 6(1) : 3 - 17.
4STONE P, VELOSO M. Multiagent systems: a survey from a machine learning perspective[J]. Autonomous Robots, 2000, 8(3) : 345 - 383.
5ERFU Y, DONGBING G. Multiagent reinforcement learning for multirobot systems: a survey[R]. Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.
6LITrMAN M L. Markov games as a framework for multiagent learning[C] // Proceeding of the 11th International Conference on Machine Learning. San Francisco: IEEE, 1994, 157 - 163.
7HU J L, WELLMAN M E Multiagent reinforcement learning: theoretical framework and an algorithm[C]//Proceeding of the 15th International Conference of Machine Learning. San Francisco: IEEE, 1998, 115 - 122.
8SUTI'ON R S, BATRO A G. Reinforcement Learning: An Introduction[M]. Cambridge, Massachusetts: MIT, 1998.
9DOMINGOS P, PAZZANI M. On the optimality of the simple bayesian classifier under zero-one loss[J]. Machine Learning, 1997, 29(2/3): 103 -130.
10JOUFFE L. Fuzzy inference system learning by reinforcement methods[J]. IEEE Transaction on Systems, Man, and Cybernetics, 1998, 28(3): 338 - 355.

共引文献49

1金翔,王天霖,于鹏垚,赵勇.基于值迭代网络的路径规划算法[J].华中科技大学学报（自然科学版）,2020,48(2):91-96. 被引量：1
2刘洋,李建军.深度确定性策略梯度算法优化[J].辽宁工程技术大学学报（自然科学版）,2020(6):545-549. 被引量：1
3邓本再,张中景,王江银.基于最优化模糊逻辑的Robocup中型组动态角色分配[J].计算技术与自动化,2011,30(1):50-53. 被引量：1
4常晓军.基于联合强化学习的RoboCup-2D传球策略[J].计算机工程与应用,2011,47(23):212-216.
5吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：20
6秦童.基于CMAC的Q算法在机器人足球中的应用[J].电子测试,2012,23(4):76-80.
7宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623-1628. 被引量：24
8娄云峰,陈斌.基于态势评估的足球机器人决策模型[J].计算机技术与发展,2013,23(9):99-102. 被引量：1
9余涛,张水平.基于5要素试错更新算法SARSA(λ)的自动发电控制[J].控制理论与应用,2013,30(10):1246-1251. 被引量：2
10张晓文,侯媛彬,王维.移动机器人路径规划的人工免疫势场算法研究[J].自动化仪表,2013,34(12):5-8. 被引量：2

同被引文献7

1董瀚泽,郭志川.BBR拥塞控制算法在无线网络中的性能改进[J].哈尔滨工业大学学报,2019,51(11):63-67. 被引量：10
2谢可,郭文静,祝文军,张楠,琚贇.面向电力物联网海量终端接入技术研究综述[J].电力信息与通信技术,2021,19(9):57-69. 被引量：14
3谢可,王剑锋,金尧,杨成,王佳楠,琚贇.电力物联网关键技术研究综述[J].电力信息与通信技术,2022,20(1):1-12. 被引量：14
4赵鹏,蒲天骄,王新迎,韩笑.面向能源互联网数字孪生的电力物联网关键技术及展望[J].中国电机工程学报,2022,42(2):447-457. 被引量：64
5潘婉苏,李晓风,谭海波,许金林,李皙茹.BBR拥塞控制算法的RTT公平性优化[J].哈尔滨工业大学学报,2022,54(11):38-46. 被引量：5
6柯杰龙,张羽,朱朋辉,黄炽坤,吴可廷.基于改进Q⁃learning算法的输电线路拟声驱鸟策略研究[J].南京信息工程大学学报（自然科学版）,2022,14(5):579-586. 被引量：1
7黄宏平,朱小勇,王志远.BBR拥塞控制算法延迟及带宽探测优化[J].计算机与现代化,2022(10):113-120. 被引量：1

引证文献1

1刘鹏辉,琚贇,高维星,张彦彦.基于强化学习的网络拥塞控制优化算法[J].电力科学与工程,2023,39(4):20-27. 被引量：1

二级引证文献1

1钟坚.基于模糊神经网络的通信网络拥塞控制方法研究[J].信息记录材料,2024,25(1):32-34.

1李双霖,李琳,潘浩,张修社,韩春雷.基于Q-Learning的编队防空目标分配方法研究[J].现代导航,2022,13(3):207-211.
2潘斌,周扬忠.六相串联三相双PMSM系统无权重系数MPTC[J].电力电子技术,2021,55(7):60-63.
3姚庆华,李华.一种基于传感器的配电网中断检测方法[J].电气自动化,2022,44(2):109-112.
4周琴,罗飞,丁炜超,顾春华,郑帅.基于逐次超松弛技术的Double Speedy Q-Learning算法[J].计算机科学,2022,49(3):239-245. 被引量：1
5王晨曦,吴丹,杨萍萍,吴清华.NOAC与VKA口服抗凝药在预防和治疗活动性癌症患者血栓性疾病中有效性和安全性的荟萃分析[J].中华心血管病杂志,2020,48(8):689-696. 被引量：9
6李淼,王思凡,范馨月.可摘式牙周夹板修复联合米诺环素对重度牙周病中龈沟液炎性因子、牙齿美观度的影响[J].中国医疗美容,2022,12(5):61-64. 被引量：3
7Danlin Bu,Yu Zhou,Chang Yang,Hengyu Feng,Chunxia Cheng,Mengjie Zhang,Zice Xu,Linghan Xiao,Yujing Liu,Zhenai Jin.Preparation of quaternarized N-halamine-grafted graphene oxide nanocomposites and synergetic antibacterial properties[J].Chinese Chemical Letters,2021,32(11):3509-3513. 被引量：1
8Gebrail BEKDAŞ,Melda YÜCEL,Sinan Melih NIGDELI.Estimation of optimum design of structural systems via machine learning[J].Frontiers of Structural and Civil Engineering,2021,15(6):1441-1452.
9Florence O. Adeyemo,Oladeji Oloyede Michael,Layi Okunlade,Pat. U. Okpala.Caring Behaviour of Nurses in Orthopaedic Wards of Selected Health Institutions as Perceived by Patients[J].Open Journal of Nursing,2016,6(5):396-403. 被引量：1
10Sen Zhao,Xi Cheng,Wen Wen,Guixing Qiu,Terry Jianguo Zhang,Zhihong Wu,Nan Wu.Advances in clinical genetics and genomics[J].Intelligent Medicine,2021,1(3):128-133.

自动化与仪器仪表

2022年第6期

浏览历史

内容加载中请稍等...

改进Q学习算法在多智能体强化学习中的应用被引量：1

参考文献3

二级参考文献22

共引文献49

同被引文献7

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

改进Q学习算法在多智能体强化学习中的应用 被引量：1

参考文献3

二级参考文献22

共引文献49

同被引文献7

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

改进Q学习算法在多智能体强化学习中的应用被引量：1