摘要
研究了一类半Markov控制过程(SMCP)在紧致行动集上关于无限水平平均代价准则的性能优化算法.利用等价Markov过程的方法,导出了SMCP的性能势公式和平均代价最优性方程,给出了求解最优或次最优平稳策略的策略迭代算法和数值迭代算法,并证明了算法的收敛性.最后给出了一个数值例子来说明算法的应用.
Optimization algorithms are studied for a class of semi-Markov control processes (SMCPs) with infinite horizon average-cost criteria and compact action sets. By the equivalent Markov process, formulas of performance potentials and average-cost optimality equations for SMCPs are derived, and a policy iteration algorithm and a value iteration algorithm are proposed, which can lead to an optimal or suboptimal stationary policy in a finite number of iterations. The convergence of these algorithms is established, without the assumption of the corresponding iteration operator being an sp-contraction. A numerical example is provided to illustrate the application of the algorithms.
基金
国家自然科学基金(60274012)
安徽省自然科学基金(01042308)资助项目.
关键词
半Markov控制过程
紧致行动集
性能势
策略迭代
数值迭代
semi-Markov control processes
compact action set
performance potentials
policy iteration
value iteration