连续时间Markov决策过程的均值-方差优化问题

A mean-variance optimization problem for continuous-time Markov decision processes

导出

摘要本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. This paper deals with the mean-variance optimization problem for continuous-time Markov decision processes in Polish spaces. The transition and reward rates are allowed to be unbounded, and the paper focuseson an optimality criterion that improves the usual we aim to find the conditions for the existence of a excepted （or mean） discounted reward criterion. Especially mean-variance optimal policy under the Polish spaces. First under suitable conditions, we prove that the variance minimization problem can be transformed into an equivalent ＂discounted-cost＂ optimization problem by using the so-called ＂first passage decomposition method＂. Then, we obtain the so-called mean-variance optimality equation and the existence of a mean-variance optimal policy that minimizes the variance over the set of policies with optimal reward. Finally, we present some examples to illustrate our results.

作者叶柳儿黄香香

机构地区暨南大学经济学院统计学系中山大学数学与计算科学学院

出处《中国科学：数学》 CSCD 北大核心 2014年第8期883-898,共16页 Scientia Sinica：Mathematica

基金国家自然科学基金(批准号:11201182) 中央高校基本科研业务费专项资金(批准号:21612314) 中山大学广东省计算科学重点实验室开放基金(批准号:201206010)资助项目

关键词连续时间Markov决策过程折扣最优化方差最小均值-方差最优策略 continuous-time Markov decision processes, discounted optimality, variance minimization~ mean-variance optimal policy

分类号 O211.62 [理学—概率论与数理统计]

引文网络
相关文献

参考文献1

1Liu-er Ye,Xian-ping Guo.Construction and Regularity of Transition Functions on Polish Spaces under Measurability Conditions[J].Acta Mathematicae Applicatae Sinica,2013,29(1):1-14. 被引量：1

1周亚平,奚宏生,殷保群,唐昊.连续时间Markov决策过程在呼叫接入控制中的应用[J].控制与决策,2001,16(B11):795-799. 被引量：3
2郑少慧.具有多项式报酬率的连续时间平均马氏决策规划[J].山东矿业学院学报,1989,8(1):84-90.
3郭先平,戴永隆.连续时间马尔可夫决策过程的折扣模型[J].数学学报（中文版）,2002,45(1):171-182.
4陈峥.二步马氏折扣模型的最优策略[J].青岛海洋大学学报（自然科学版）,1993,23(1):130-134. 被引量：1
5运筹学[J].中国学术期刊文摘,2006,12(11):18-20.
6贾让成.字典序下的折扣多目标半马氏决策模型[J].西安电子科技大学学报,1989,16(2):55-63.
7李江红,胡照文.一类Markov决策过程自适应决策的新方法[J].控制与决策,2001,16(4):415-419.
8伍从斌.无界报酬折扣半马氏决策模型矩最优策略的结构[J].云南大学学报（自然科学版）,1990,12(4):299-306. 被引量：1
9伍从斌.无界报酬折扣半马氏决策模型矩最优策略的存在性[J].云南大学学报（自然科学版）,1991,13(3):199-206.
10许青松.部分可观的马尔可夫决策规划折扣模型的解法[J].湖南大学学报（自然科学版）,1995,22(5):16-20.

中国科学：数学

2014年第8期

浏览历史

内容加载中请稍等...

连续时间Markov决策过程的均值-方差优化问题

参考文献1

相关作者

相关机构

相关主题

浏览历史