摘要
Q-learning作为一种经典的强化学习算法,其在离散状态下存在计算量高、收敛速度慢等问题。Speedy Q-learning是Q-learning的变种,目的是解决Q-learning算法收敛速度慢问题。为解决多智能体强化学习中“维数灾”问题,在Speedy Q-learning算法的基础上提出了一种基于动作采样的(action sampling based on Speedy Q-learning,ASSQ)算法。该算法采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,将上一迭代步更新后的Q值作为下一状态的最大Q值,有效降低了Q值的比较次数,整体上提升了算法的收敛速度。为减少学习阶段计算量,算法在集中训练阶段求取下一状态最大Q值时,并没有遍历所有联合动作Q值,而只在联合动作空间上进行部分采样。在动作选择和执行阶段,每个智能体又根据学习到的策略独立选择动作,从而有效提高了算法的学习效率。通过在目标运输任务上验证,ASSQ算法能够以100%的成功率学习到最优联合策略,且计算量明显少于Q-learning算法。
As a classical reinforcement learning algorithm,Q-learning has some problems such as high computational load and slow convergence speed in discrete state.Speedy Q-learning is a variant of Q-learning,which aims to solve the problem of slow convergence of Q-learning algorithm.In order to solve the problem of"dimension disaster"in multi-agent reinforcement learning,an action sampling based on Speedy Q-learning(ASSQ)algorithm is proposed.Centralized training with decentralized execution(CTDE)is adopted in this algorithm.The Q-value updated in the last iteration step is taken as the maximum Q-value of the next state,effectively reducing the comparison times of Q-values,which improves the convergence speed of the algorithm on the whole.In order to reduce the amount of computation in the learning stage,the algorithm does not traverse all the joint action Q-values in the centralized training stage,but only carries out partial sampling in the joint action space.In the stage of action selection and execution,each agent chooses actions independently according to the learned strategy,thus effectively improving the learning efficiency of the algorithm.Through the verification on target transportation task,ASSQ algorithm can learn the optimal joint strategy with 100%success rate,and the calculation amount is significantly less than Q-learning algorithm.
作者
赵德京
马洪聪
王家曜
周维庆
ZHAO Dejing;MA Hongcong;WANG Jiayao;ZHOU Weiqing(School of Automation,Qingdao University,Qingdao Shandong 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Limited Liability Company,Qingdao Shandong 266043,China)
出处
《自动化与仪器仪表》
2022年第6期13-16,22,共5页
Automation & Instrumentation
基金
青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。