摘要
强化学习的研究需要解决的重要难点之一是:探索未知的动作和采用已知的最优动作之间的平衡。贝叶斯学习是一种基于已知的概率分布和观察到的数据进行推理,做出最优决策的概率手段。因此,把强化学习和贝叶斯学习相结合,使 Agent 可以根据已有的经验和新学到的知识来选择采用何种策略:探索未知的动作还是采用已知的最优动作。本文分别介绍了单 Agent 贝叶斯强化学习方法和多 Agent 贝叶斯强化学习方法:单 Agent 贝叶斯强化学习包括贝叶斯 Q 学习、贝叶斯模型学习以及贝叶斯动态规划等;多 Agent 贝叶斯强化学习包括贝叶斯模仿模型、贝叶斯协同方法以及在不确定下联合形成的贝叶斯学习等。最后,提出了贝叶斯在强化学习中进一步需要解决的问题。
A central problem in reinforcement learning is balancing exploration of untested actions against exploitation of actions that are known to be good. Bayesian learning is a probability method that makes optimal decision based on known probability distribution and recently observed data. So combination of Bayesian learning and reinforcement learning the agent can choose the strategy of exploration or exploitation based on its own experience and newly incoming knowledge. In this paper, we introduce single-agent Bayesian reinforcement learning and multi-agent Bayesian reinforcement learning. Single-agent Bayesian reinforcement learning includes Bayesian Q-learning, model-based Bayesian learning and Bayesian DP, and multi-agent Bayesian reinforcement learning includes Bayesian imitation, Bayesian coordination and Bayesian reinforcement learning for coalition formation under uncertainty. At last, some unsolved problems in Bayesian reinforcement learning are discussed.
出处
《计算机科学》
CSCD
北大核心
2006年第2期173-177,共5页
Computer Science
基金
本课题得到国家自然科学基金(60475026)
国家"973"重点基础研究发展计划基金项目(2002CB312002)
江苏省自然科学基金(BK2004079)的资助。