摘要
在大规模随机控制问题中 ,值函数逼近是一种克服维数灾的方法 .考虑平均模型马氏决策规划 (MDP)的状态软集结相对值迭代算法 ,在Span压缩的条件下 ,证明了该算法的收敛性 ,同时还给出了其误差估计 .
A straightforward way to dispel the curse of dimensionality in large stochastic control problems is to replace the lookup table with a generalized function approximator such as state aggregation. The relative value iteration algorithm for average reward Markov decision processes (MDP) with soft state aggregation is investigated. Under a condition of the contraction with span semi norm, the convergence of the proposed algorithm is proved and an error bound of the approximation is also given.
出处
《控制理论与应用》
EI
CAS
CSCD
北大核心
2000年第3期415-418,共4页
Control Theory & Applications
基金
Foundationitem :supportedbytheNationalNaturalScienceFoundationofChina (696740 0 5) .
关键词
随机控制
状态软集结
相对值
迭代算法
dynamic programming
Markov decision processes
compact representation
state aggregation
average reward