摘要
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立.
This paper studies average optimality in Markov decision processes with countable state space, nonempty action sets and unbounded reward function. New conditions are discussed under which there exists an (ε) optimal stationary policy, and that the average criterion optimality inequality holds when the summation in it is well defined.
出处
《运筹学学报》
CSCD
北大核心
2002年第1期1-8,共8页
Operations Research Transactions
基金
The project was supported by the National Natural Science Foundation of China.
关键词
马氏决策过程
平均准则最优不等式
无界报酬
非空决策集
Markov decision process, average criterion optimality inequality, un-bounded rewards, nonempty action sets.