摘要
介绍了强化学习模型 ,分别提出了 7个主要的强化学习算法并讨论了它们之间的区别和联系 。
The model of reinforcement learning is first introduced in this paper ,Then the seven main algorithms including dynamic programming, Monte-Carlo method ,Temporal-Difference, Q-learning are given respectively and their difference and relation are pointed out .At last, future research direction are proposed.
关键词
强化学习
动态规划
蒙特卡罗算法
瞬时差分算法
reinforcement learning
Dynamic Programming
Monte-Carlo method
Temporal-DiReinfo