摘要
本文研究平均报酬马氏决策过程(MDP)的相对值迭代算法.给出了span半范数压缩因子的一个表达式,证明了该因子小于1时本文绘出的相对值迭代算法及小步长相对值迭代算法均收敛到其最优解.
In this paper, the relative value iteration algorithm for average reward Markov decision processes (MDP)is investigated. A formulation of contraction factor of span seminorm is given, the convergence of relative value iteration (RVI) algorithm and the smallstep RVI algorithm are proved under a condition of the contraction span semi-norm.
出处
《运筹学学报》
CSCD
1999年第2期1-9,共9页
Operations Research Transactions
关键词
马氏决策过程
Span压缩
相对值迭代算法
Markov decision processes
contraction mappings
dynamic programming
average reward