期刊文献+

一种基于性能势的无折扣强化学习算法 被引量:2

Undiscounted Reinforcement Learning Algorithm Based on Performance Potentials
下载PDF
导出
摘要 传统基于性能势的学习算法能获得马尔可夫决策问题的最优策略。这些算法主要采用单路径采样的方法,使得学习算法效率不高。将性能势与强化学习相结合,提出了一种基于性能势的无折扣值迭代学习算法——G学习,并将其与经典的无折扣强化学习算法(R学习)相比较,获得了较好的实验结果。 Traditional performance potential-based learning algorithms can obtain optimal policies in MDP problems. They mainly adopt single sample path based on methods which make them less efficient. In this paper,a new learning algorithm which utilizes performance potential and reinforcement learning is proposed. Compared with the classic R-learning algorithm ,it has promising results.
作者 周如益 高阳
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2006年第4期58-61,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(60475026)
关键词 强化学习 性能势 无折扣 值迭代 reinforcement learning performance potential undiscounted value iteration
  • 相关文献

参考文献9

  • 1MANHADEVAN S.Average reward reinforcement learning:foundations,algorithms and empirical results[J].Machine Learning,1996,22:159-195.
  • 2SUTTON R S,BARTO A.Reinforcement learning:an introduction[M].Cambridge,MA:MIT Press,1998.
  • 3CAO Xi-ren,CHEN Han-fu.Perturbation realization,potentials and sensitivity analysis of Markov processes[J].IEEE Transactions of Automatic Control,1997,42:1382-1393.
  • 4CAO Xi-ren.The relation among potentials,perturbation analysis and Markov decision processes[J].Journal of Discrete Event Dynamic Systems,1998,8:71-87.
  • 5CAO Xi-ren.Single sample path based optimization of Markov chains[J].Journal of Optimization:Theory and Application,1999,100:527-548.
  • 6FANG Hai-tao,CAO Xi-ren.Potential-based on-line policy iteration algorithms for markov decision processes[J].IEEE Transactions on Automatic Control,2004,49:493-505.
  • 7FANG Hai-tao,CAO Xi-ren.Recursive approaches for single sample path based Markov reward processes[J].Asian Journal of Control,2001,3:21-26.
  • 8CAO Xi-ren.From perturbation analysis to Markov decision processes and reinforcement learning[J].Journal of Discrete Event Dynamic Systems,2003,13:9-39.
  • 9MARBACH P,TSITSIKLIS T N.Simulation based optimization of Markov reward process[R].Massachusetts:Laboratory for Information and Decision System,Massachusetts Institute of Technology,1998.

同被引文献11

  • 1蒋伟进,许宇胜,吴泉源,孙星明.基于多智能体的分布式智能诊断方法研究[J].电子学报,2004,32(F12):235-237. 被引量:8
  • 2高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 3Szepesvari C. Algorithms for reinforcement learning:Synthesis lectures on artificial intelligence and machine learning[M].San Rafael:Morgan & Claypool Pulishers,2009.2-3.
  • 4Chatterjee K,Majumadar R,Henzinge A T. Stochastic limitaverage games are in exptime[J].International Journal in Game Theory,2007,(02):219-234.
  • 5Tadepalli P,D OK. Model-based average reward reinforcement learning[J].Artificial Intelligence,1998,(1-2):177-224.
  • 6Sun T,Zhao Q,Luh P B. A rollout algorithm for multi chain Markov decision processes with average cost[J].Positive Systems,2009.151-162.
  • 7Yanjie L. An average reward performance potential estimation with geometric variance reduction[A].2012.2061-2065.
  • 8Cao X R. Stochastic learning and optimization:A sensitivitybased approach[J].Annual Reviews in Control,2009,(01):11-24.
  • 9Munos R. Geometric variance reduction in Markov chains:Application to value function and gradient estimation[J].Journal of Machine Learning Research,2006.413-427.
  • 10左国玉,张红卫,韩光胜.基于多智能体强化学习的新强化函数设计[J].控制工程,2009,16(2):239-242. 被引量:4

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部