摘要
强化学习使用马尔可夫决策过程的形式化框架,使用状态、动作和奖励定义学习型智能体与环境的交互过程。多智能体强化学习存在联合动作数随智能体个数的增加呈指数级增长的问题。为缓解此问题,提出一种基于动作采样的Q学习(action-sampling based Q-learning,ASQ)算法。该算法采用集中训练-分散执行的框架,在集中训练阶段更新联合动作Q值时并没有遍历所有联合动作Q值,而只对部分联合动作Q值进行采样。在动作选择和执行阶段,每个智能体又独立选择动作,有效减少了学习阶段的计算量。实验结果表明,该算法能够以100%的成功率学习到最优联合策略。
Reinforcement learning uses the formal framework of Markov decision process,and uses states,actions and reward to define the interaction process between a learning agent and the environment.In multi-agent reinforcement learning,the number of joint actions increases exponentially with the increase of the number of agents.To alleviate this problem,an action-sampling based Q-learning(ASQ)algorithm is proposed.The algorithm adopts the framework of centralized training and decentralized execution.When updating Q-values of joint actions in centralized training phase,it does not find all the Q-values of joint actions,but only samples some of the Q-values of joint actions.In the action selection and execution stage,each agent selects the action independently,which greatly reduces the computation amount in the learning stage.The experimental results show that the proposed algorithm can learn the optimal joint strategy with 100%success rate.
作者
赵德京
马洪聪
廖登宇
崔浩岩
ZHAO Dejing;MA Hongcong;LIAO Dengyu;CUI Haoyan(School of Automation,Qingdao University,Qingdao 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Co.,Ltd.,Qingdao 266043,China)
出处
《控制工程》
CSCD
北大核心
2024年第1期70-79,共10页
Control Engineering of China
基金
青岛市博士后应用研究项目。
关键词
多智能体强化学习
强化学习
Q学习
动作采样
Multi-agent reinforcement learning
reinforcement learning
Q-learning
action-sampling