摘要
在基于性能势的随机逼近方法中引入双时间尺度的概念,提出了离散时间Markov控制过程的基于性能势的双时间尺度仿真梯度算法,弥补了传统算法中每步更新算法更新频率过快和更新环更新算法更新频率过慢的不足,并利用三个数值例子来说明双时间尺度更新算法在计算复杂度、收敛速度和收敛精度上的优势。
A novel two time-scale simulation-based gradient algorithm based on performance potential for discrete time Markov decision process was proposed, by introducing the concept of two time-scale into the performance potential based stochastic approximation. This algorithm tackles the limitations in classical approaches that the every-update simulation- based gradient algorithm updates too frequently, and the regenerative-update gradient algorithm updates too infrequently. Three numerical examples illustrate the superiority of two time-scale simulation-based gradient algorithm in computational complexity, convergence speed and convergence precision.
出处
《系统仿真学报》
CAS
CSCD
北大核心
2009年第13期4114-4119,共6页
Journal of System Simulation
基金
国家自然科学基金(60574065
60774038)
关键词
MARKOV控制过程
性能势
双时间尺度
随机逼近
Markov decision process
performance potential
two time-scale
stochastic approximation