期刊文献+

一种合作Markov决策系统 被引量:1

A Cooperation Markov Decision Process System
下载PDF
导出
摘要 在机器学习中,强化学习是一个重要的研究领域。Markov决策过程(MDP)是强化学习的重要基础,在一般的Markov决策系统中,只考虑一个智能体的学习演化。但目前诸多问题中只考虑单个智能体的学习演化有一定的局限性,越来越多的应用中都涉及到多个智能体。进而引入一种带有两个智能体的联合Markov决策系统(CMDP),该系统适用于两个智能体之间合作决策的学习演化。智能体之间存在合作或博弈两种类型,文中重点研究合作类型的CMDP,在此类学习模型中,智能体交替执行行为,以社会价值作为求优准则,寻找最优策略对(π*0,π*1),共同完成目标任务。进一步给出了在联合Markov系统中寻找最优策略对的算法,其根本任务是寻找一个最优策略对(π*0,π*1),形成一个合作系统CMDP(π*0,π*1),且系统模型可以进一步扩充到多个智能体的联合决策系统。 Reinforcement learning is an important research area in machine learning.Then Markov decision process(MDP)is the important basics in reinforcement learning.In the usual Markov decision system,only one agent’s learning evolution is considered.Among many problems,only the learning evolution of a single agent is considered,which has certain limitations,but the actual application involves multiple agents.For the reason,a cooperation Markov decision process(CMDP)with two agents is introduced,which is suitable for the learning evolution of cooperation decision between two agents.The focuses of the research is the cooperative CMDP.In this kind of learning model,the agent alternately performs behaviors,seeks the optimal pair of strategies(π*0,π*1)with social value optimization criteria and accomplishes the target tasks together.Researching the algorithm for finding the optimal strategy pair(π*0,π*1),which is to find an optimal strategy pair and form an evolutionary system CMDP(π*0,π*1).In addition,this system model can also be extended to the joint decision system of multiple agents.
作者 雷莹 许道云 LEI Ying;XU Dao-yun(School of Computer Science and Technology,Guizhou University,Guiyang 550025,China)
出处 《计算机技术与发展》 2020年第12期8-14,共7页 Computer Technology and Development
基金 国家自然科学基金(61762019,61862051)。
关键词 强化学习 智能体 联合Markov决策过程 最优策略对 算法 reinforcement learning agent cooperation Markov decision process optimal pair of strategies algorithm
  • 相关文献

参考文献4

二级参考文献28

  • 1Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4(2):237~285
  • 2Sandip S C.Adaption,coevolution and learning in multiagent systems.In:Proceedings of AAAI Spring Symposium,AAAI Technical Report SS-96-01,AAAI,1996.57~62
  • 3Weiss G,Dillenbourg P.What is multi in multiagent learning? Collaborative Learning,Cognitive and Computational Approaches.Amsterdam,Holland:Pergamon Press,1998.64~80
  • 4Narendra P,Sandip S,Maria G.Shared memory based cooperative coevolution.In:Proceedings of IEEE International Conference on Evolutionary Computation,IEEE,1998.570~574
  • 5Littman M L.Markov games as a framework for multiagent reinforcement learning.In:Proceedings of the 11th Interna tional Conference on Machine learning,Morgan Kaufmann,1994.157163
  • 6Littman M L.Friend-or-foe:Q-learning in general-sum games.In:Proceedings of the 18th International Conference on Machine Learning,Morgan Kaufmann,2001.322~328
  • 7Hu J,Wellman M P.Nash Q-Learning for General-Sum stochastic games.Journal of Machine Learning,2003,4:1039~1069
  • 8Mitchell T M.Machine Learning.USA:McGraw-Hill Companics Inc.1997,367~387
  • 9Watkins C J C H,Dayan P.Technical note Q-learning.Journal of Machine Learning,1992,(8):279~292
  • 10Haussler D.Quantifying inductive bias:AI learning algorithms and valiant's learning framework.Artificial Intelligence,1988,36(2):177~221

共引文献44

同被引文献13

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部