期刊文献+

基于神经网络的强化学习研究概述 被引量:4

Research on Reinforcement Learning Based on Neural Network:A Summary
下载PDF
导出
摘要 随着强化学习的日益发展,研究越来越深入,将神经网络引入强化学习的研究中已经成为热点课题之一。本文首先介绍强化学习的定义、原理及一般结构,接着简单表述神经网络的基本内容和马尔科夫决策过程模型;然后将强化学习和神经网络融合,重点介绍了两种常见的学习算法,算法都是改进过并且融合了神经网络的特性;最后,简单介绍该方式的强化学习在人工智能,控制系统,游戏以及优化调度等领域的应用情况。 Along with the development of intensive study and study more and more thorough,the neural network into the study of Reinforcement Learning has become one of the hot topics.In this paper,we firstly survey the definition,principle and struc ture of reinforcement learning,then simply express the basic content of neural network and Markov decision process model;In addition,we focus on the three common learning algorithm based on the integration of reinforcement learning and neural net work.,which are improved and combined with the characteristics of the neural network.Last,we introduce the application of re inforcement learning in artificial intelligence,controlling system,games and optimization scheduling and so on.
作者 尤树华 周谊成 王辉 YOU Shu-hua,ZHOU Yi-cheng,WANG Hui(College of Computer Science and Technology,Soochow University,Suzhou 215006,China)
出处 《电脑知识与技术》 2012年第10期6782-6786,共5页 Computer Knowledge and Technology
关键词 强化学习 神经网络 马尔科夫决策过程 算法 应用 reinforcement learning neural network Markov decision process model algorithm application
  • 相关文献

参考文献15

二级参考文献42

  • 1Astom K J. Optimal control of Markov derision processes with incomplete state estimation[J ]. Math'Anal Appl, 1998,10:174 - 205.
  • 2Tsitsiklis J N, Roy B V. An Analysis of Temporal-Difference Learning with Function Approximation[J]. IEEE Transactions on Automatic Control, 1997,42 (5) : 674 - 690.
  • 3Tesauro G J. TD-gammon, a self- teaching backgammon program[J]. Neural Computation, 1994, 6(2) :215 - 2192.
  • 4Suton R S, Learning to predict by the methods of temporal diferences[J]. Machine Learning, 1988(3): 9 - 44.
  • 5Suton R S,Barto A G. Reinforcement Learning: Introduction[M].Cambridge,MA:MIT Press,1998.
  • 6Tom M Mitchell.Machine learning[M].Beijing, China:Machine Press,2004:263-280.
  • 7Dayan P.The convergence of TD (λ) for general λ[J].Machine Learning, 1992(8):341-362.
  • 8Kaelbling L P, Littman M L,Moore A W.Reinforcement learning: A survey[J].Joumal of Artificial Intelligence Research, 1996(4): 237-285.
  • 9Watins P Dyna. Q_leaming [J]. Machine Learning, 1992,8 (3): 279-292.
  • 10Moor A W, Atkeson C G.Prioritized sweeping: Reinforcement learning with less data and less real time[J].Machine Learning, 1993,13:103-130.

共引文献291

同被引文献24

引证文献4

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部