期刊文献+

一种基于动作采样的Q学习算法

An Action-sampling Based Q-learning Algorithm
下载PDF
导出
摘要 强化学习使用马尔可夫决策过程的形式化框架,使用状态、动作和奖励定义学习型智能体与环境的交互过程。多智能体强化学习存在联合动作数随智能体个数的增加呈指数级增长的问题。为缓解此问题,提出一种基于动作采样的Q学习(action-sampling based Q-learning,ASQ)算法。该算法采用集中训练-分散执行的框架,在集中训练阶段更新联合动作Q值时并没有遍历所有联合动作Q值,而只对部分联合动作Q值进行采样。在动作选择和执行阶段,每个智能体又独立选择动作,有效减少了学习阶段的计算量。实验结果表明,该算法能够以100%的成功率学习到最优联合策略。 Reinforcement learning uses the formal framework of Markov decision process,and uses states,actions and reward to define the interaction process between a learning agent and the environment.In multi-agent reinforcement learning,the number of joint actions increases exponentially with the increase of the number of agents.To alleviate this problem,an action-sampling based Q-learning(ASQ)algorithm is proposed.The algorithm adopts the framework of centralized training and decentralized execution.When updating Q-values of joint actions in centralized training phase,it does not find all the Q-values of joint actions,but only samples some of the Q-values of joint actions.In the action selection and execution stage,each agent selects the action independently,which greatly reduces the computation amount in the learning stage.The experimental results show that the proposed algorithm can learn the optimal joint strategy with 100%success rate.
作者 赵德京 马洪聪 廖登宇 崔浩岩 ZHAO Dejing;MA Hongcong;LIAO Dengyu;CUI Haoyan(School of Automation,Qingdao University,Qingdao 266071,China;Qingdao Petrochemical Maintenance and Installation Engineering Co.,Ltd.,Qingdao 266043,China)
出处 《控制工程》 CSCD 北大核心 2024年第1期70-79,共10页 Control Engineering of China
基金 青岛市博士后应用研究项目。
关键词 多智能体强化学习 强化学习 Q学习 动作采样 Multi-agent reinforcement learning reinforcement learning Q-learning action-sampling
  • 相关文献

参考文献4

二级参考文献28

  • 1赵红,李雅菊,宋涛.基于贝叶斯网络的工程项目风险管理[J].沈阳工业大学学报(社会科学版),2008,1(3):239-244. 被引量:25
  • 2李晓毅,徐兆棣.增量式贝叶斯分类的原理和算法[J].沈阳工业大学学报,2006,28(4):422-425. 被引量:7
  • 3KIM J H, VADAKEPAT E Multi-agent systems: a survey from the robot-soccer perspective[J]. International Journal of Intelligent Automation and Soft Computing, 2000, 6(1) : 3 - 17.
  • 4STONE P, VELOSO M. Multiagent systems: a survey from a machine learning perspective[J]. Autonomous Robots, 2000, 8(3) : 345 - 383.
  • 5ERFU Y, DONGBING G. Multiagent reinforcement learning for multirobot systems: a survey[R]. Technical Report CSM-404, Department of Computer Science, University of Essex, 2004.
  • 6LITrMAN M L. Markov games as a framework for multiagent learning[C] // Proceeding of the 11th International Conference on Machine Learning. San Francisco: IEEE, 1994, 157 - 163.
  • 7HU J L, WELLMAN M E Multiagent reinforcement learning: theoretical framework and an algorithm[C]//Proceeding of the 15th International Conference of Machine Learning. San Francisco: IEEE, 1998, 115 - 122.
  • 8SUTI'ON R S, BATRO A G. Reinforcement Learning: An Introduction[M]. Cambridge, Massachusetts: MIT, 1998.
  • 9DOMINGOS P, PAZZANI M. On the optimality of the simple bayesian classifier under zero-one loss[J]. Machine Learning, 1997, 29(2/3): 103 -130.
  • 10JOUFFE L. Fuzzy inference system learning by reinforcement methods[J]. IEEE Transaction on Systems, Man, and Cybernetics, 1998, 28(3): 338 - 355.

共引文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部