期刊文献+

改进Q学习算法在多智能体强化学习中的应用 被引量:1

Application of Improved Q-learning Algorithm in Multi-agent Reinforcement Learning
原文传递
导出
摘要 Q-learning作为一种经典的强化学习算法,其在离散状态下存在计算量高、收敛速度慢等问题。Speedy Q-learning是Q-learning的变种,目的是解决Q-learning算法收敛速度慢问题。为解决多智能体强化学习中“维数灾”问题,在Speedy Q-learning算法的基础上提出了一种基于动作采样的(action sampling based on Speedy Q-learning,ASSQ)算法。该算法采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,将上一迭代步更新后的Q值作为下一状态的最大Q值,有效降低了Q值的比较次数,整体上提升了算法的收敛速度。为减少学习阶段计算量,算法在集中训练阶段求取下一状态最大Q值时,并没有遍历所有联合动作Q值,而只在联合动作空间上进行部分采样。在动作选择和执行阶段,每个智能体又根据学习到的策略独立选择动作,从而有效提高了算法的学习效率。通过在目标运输任务上验证,ASSQ算法能够以100%的成功率学习到最优联合策略,且计算量明显少于Q-learning算法。 As a classical reinforcement learning algorithm,Q-learning has some problems such as high computational load and slow convergence speed in discrete state.Speedy Q-learning is a variant of Q-learning,which aims to solve the problem of slow convergence of Q-learning algorithm.In order to solve the problem of"dimension disaster"in multi-agent reinforcement learning,an action sampling based on Speedy Q-learning(ASSQ)algorithm is proposed.Centralized training with decentralized execution(CTDE)is adopted in this algorithm.The Q-value updated in the last iteration step is taken as the maximum Q-value of the next state,effectively reducing the comparison times of Q-values,which improves the convergence speed of the algorithm on the whole.In order to reduce the amount of computation in the learning stage,the algorithm does not traverse all the joint action Q-values in the centralized training stage,but only carries out partial sampling in the joint action space.In the stage of action selection and execution,each agent chooses actions independently according to the learned strategy,thus effectively improving the learning efficiency of the algorithm.Through the verification on target transportation task,ASSQ algorithm can learn the optimal joint strategy with 100%success rate,and the calculation amount is significantly less than Q-learning algorithm.
作者 赵德京 马洪聪 王家曜 周维庆 ZHAO Dejing;MA Hongcong;WANG Jiayao;ZHOU Weiqing(School of Automation,Qingdao University,Qingdao Shandong 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Limited Liability Company,Qingdao Shandong 266043,China)
出处 《自动化与仪器仪表》 2022年第6期13-16,22,共5页 Automation & Instrumentation
基金 青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。
关键词 Q-LEARNING Speedy Q-learning 多智能体强化学习 动作采样 Q-learning speedy Q-learning multi-agent reinforcement learning action sampling
  • 相关文献

参考文献3

二级参考文献22

  • 1赵红,李雅菊,宋涛.基于贝叶斯网络的工程项目风险管理[J].沈阳工业大学学报(社会科学版),2008,1(3):239-244. 被引量:25
  • 2李晓毅,徐兆棣.增量式贝叶斯分类的原理和算法[J].沈阳工业大学学报,2006,28(4):422-425. 被引量:7
  • 3KIM J H, VADAKEPAT E Multi-agent systems: a survey from the robot-soccer perspective[J]. International Journal of Intelligent Automation and Soft Computing, 2000, 6(1) : 3 - 17.
  • 4STONE P, VELOSO M. Multiagent systems: a survey from a machine learning perspective[J]. Autonomous Robots, 2000, 8(3) : 345 - 383.
  • 5ERFU Y, DONGBING G. Multiagent reinforcement learning for multirobot systems: a survey[R]. Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.
  • 6LITrMAN M L. Markov games as a framework for multiagent learning[C] // Proceeding of the 11th International Conference on Machine Learning. San Francisco: IEEE, 1994, 157 - 163.
  • 7HU J L, WELLMAN M E Multiagent reinforcement learning: theoretical framework and an algorithm[C]//Proceeding of the 15th International Conference of Machine Learning. San Francisco: IEEE, 1998, 115 - 122.
  • 8SUTI'ON R S, BATRO A G. Reinforcement Learning: An Introduction[M]. Cambridge, Massachusetts: MIT, 1998.
  • 9DOMINGOS P, PAZZANI M. On the optimality of the simple bayesian classifier under zero-one loss[J]. Machine Learning, 1997, 29(2/3): 103 -130.
  • 10JOUFFE L. Fuzzy inference system learning by reinforcement methods[J]. IEEE Transaction on Systems, Man, and Cybernetics, 1998, 28(3): 338 - 355.

共引文献49

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部