摘要
本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性.
This paper deals with the mean-variance optimization problem for continuous-time Markov decision processes in Polish spaces. The transition and reward rates are allowed to be unbounded, and the paper focuseson an optimality criterion that improves the usual we aim to find the conditions for the existence of a excepted (or mean) discounted reward criterion. Especially mean-variance optimal policy under the Polish spaces. First under suitable conditions, we prove that the variance minimization problem can be transformed into an equivalent "discounted-cost" optimization problem by using the so-called "first passage decomposition method". Then, we obtain the so-called mean-variance optimality equation and the existence of a mean-variance optimal policy that minimizes the variance over the set of policies with optimal reward. Finally, we present some examples to illustrate our results.
出处
《中国科学:数学》
CSCD
北大核心
2014年第8期883-898,共16页
Scientia Sinica:Mathematica
基金
国家自然科学基金(批准号:11201182)
中央高校基本科研业务费专项资金(批准号:21612314)
中山大学广东省计算科学重点实验室开放基金(批准号:201206010)资助项目