期刊文献+

基于性能势的Markov控制过程双时间尺度仿真算法

Two-Timescale Simulation-based Algorithm for Markov Decision Process Based on Performance Potentials
下载PDF
导出
摘要 在基于性能势的随机逼近方法中引入双时间尺度的概念,提出了离散时间Markov控制过程的基于性能势的双时间尺度仿真梯度算法,弥补了传统算法中每步更新算法更新频率过快和更新环更新算法更新频率过慢的不足,并利用三个数值例子来说明双时间尺度更新算法在计算复杂度、收敛速度和收敛精度上的优势。 A novel two time-scale simulation-based gradient algorithm based on performance potential for discrete time Markov decision process was proposed, by introducing the concept of two time-scale into the performance potential based stochastic approximation. This algorithm tackles the limitations in classical approaches that the every-update simulation- based gradient algorithm updates too frequently, and the regenerative-update gradient algorithm updates too infrequently. Three numerical examples illustrate the superiority of two time-scale simulation-based gradient algorithm in computational complexity, convergence speed and convergence precision.
出处 《系统仿真学报》 CAS CSCD 北大核心 2009年第13期4114-4119,共6页 Journal of System Simulation
基金 国家自然科学基金(60574065 60774038)
关键词 MARKOV控制过程 性能势 双时间尺度 随机逼近 Markov decision process performance potential two time-scale stochastic approximation
  • 相关文献

参考文献14

  • 1M L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming [M]. New York, USA: Wiley, 1994.
  • 2X R Cao. Stochastic Learning and Optimization: Sensitivity-Based Approach [M]. Germany: Springer, 2007.
  • 3E K P Chong, P J Ramadage. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis [J]. IEEE Trans. on Automatic Control (S0018-9286), 1994, 39(7): 1400-1410.
  • 4Aberdeen Douglas Alexander. Policy-gradient algorithms for partially observable Markov decision process [D]. Australia: Australian National University, 2003.
  • 5代桂平,殷保群,王肖龙,奚宏生.受控M/G/1排队系统的性能优化及迭代算法[J].系统仿真学报,2004,16(8):1683-1685. 被引量:3
  • 6P Marbach, J N Tsitsiklis. Simulation-based optimization of Markov reward processes [J]. IEEE Trans. Automatic Control (S0018-9286), 2001, 46(2): 191-209.
  • 7X R Cao. A Basic Formula for Online Policy Gradient Algorithms [J]. IEEE Trans. on Automatic Control (S0018-9286), 2005, 50(5): 696-699.
  • 8J Baxter, P L Barlett. Infinite-horizon policy gradient estimation [J]. Journal of Artificial Intelligence Research (S1076-9757), 2001, 15: 319-350.
  • 9Y J Li, B Q Yin, H S Xi. The policy gradient estimation of continuous-time Hidden Markov Decision Process [C]//Proc. IEEE International Conference on Information Acquisition, 2005.
  • 10V S Borkar. Stochastic approximation with two time scales [J]. Systems and Control Letters (SO 167-6911 ), 1997, 29(5): 291-294.

二级参考文献7

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部