STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES 被引量：1

STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES

导出

摘要 This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.

作者 ZHU Quanxin GUO Xianping

机构地区 School of Mathematical Sciences and Institute of Finance and Statistics School of Mathematics and Computational Science

出处《Journal of Systems Science & Complexity》 SCIE EI CSCD 2014年第5期1045-1063,共19页 系统科学与复杂性学报（英文版）

基金 supported by the National Natural Science Foundation of China under Grant Nos.61374080 and 61374067 the Natural Science Foundation of Zhejiang Province under Grant No.LY12F03010 the Natural Science Foundation of Ningbo under Grant No.2012A610032 Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions

关键词 Continuous-time Markov decision process expected average reward criterion finite-horizon optimality Polish space strong n-discount optimality 马尔可夫决策过程连续时间折扣最优平稳策略控制问题最优策略地平线回报率

分类号 O225 [理学—运筹学与控制论]

引文网络
相关文献

参考文献29

1Hernndez-Lerma O and Lasserre J B, Discrete-Time Markov Control Processes: Basic Optimal- ity Criteria Springer, New York, 1996.
2Hernndez-Lerma O and Lasserre J B, Further Topics on Discrete-Time Markov Control Pro- cesses, Springer, New York, 1999.
3Puterman M L, Markov Decision Process Wiley New York, 1994.
4Sennott L I, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley, New York, 1999.
5Arapostathis A, Borkar V S, Fernndez-Gaucherand E, Ghosh M K, and Markus S I, Discrete- time controlled Markov processes with average cost criterion: A survey, SIAM J. Control Optim., 1993, 31(2): 282-344.
6Guo X P and Rieder U, Average optimality for continuous-time Markov decision processes in Polish spaces, Ann. Appl. Probab., 2006, 16(2): 730-756.
7Zhu Q X, Average optimality inequality for continuous-time Markov decision processes in Polish spaces, Math. Methods Oper. Res., 20(}7, 66(2): 299-313.
8Zhu Q X, Average optimality for continuous-time Markov decision processes with a policy itera- tion approach, J. Math. Anal. Appl., 2008, 339(1): 691-704.
9Hernndez-Lerma O, Vega-Amaya O, and Carrasco G, Sample-path optimality and variance- minimization of average cost Markov control processes, SIAM J. Control Optim., 1999, 38(1): 79-93.
10Zhu Q X and Guo X P, Markov decision processes with variance minimization: A new condition and approach, Stoch. Anal. Appl., 2007, 25(3): 577-592.

同被引文献3

1LI Yan,MU Yifen,YUAN Shuo,GUO Lei.The Game Theoretical Approach for Multi-phase Complex Systems in Chemical Engineering[J].Journal of Systems Science & Complexity,2017,30(1):4-19. 被引量：7
2WANG Menghan,LI Lin,DAI Qianzhi,SHI Fangnan.Resource Allocation Based on DEA and Non-Cooperative Game[J].Journal of Systems Science & Complexity,2021,34(6):2231-2249. 被引量：2
3ZHANG Liangquan.A BSDE Approach to Stochastic Differential Games Involving Impulse Controls and HJBI Equation[J].Journal of Systems Science & Complexity,2022,35(3):766-801. 被引量：1

引证文献1

1WU Yiting,ZHANG Junyu,HUANG Song.Stationary Almost Markov ε-Equilibria for Discounted Stochastic Games with Borel Spaces and Unbounded Payoffs[J].Journal of Systems Science & Complexity,2024,37(4):1672-1684.

1ZHU Quan-xin.Variance minimization for continuous-time Markov decision processes: two approaches[J].Applied Mathematics(A Journal of Chinese Universities),2010,25(4):400-410. 被引量：1
2Xianping GUO,Lanlan ZHANG.TOTAL REWARD CRITERIA FOR UNCONSTRAINED/CONSTRAINED CONTINUOUS-TIME MARKOV DECISION PROCESSES[J].Journal of Systems Science & Complexity,2011,24(3):491-505.
3龚六堂,费溥生.LOCAL AND GLOBAL STABILITY OF INFINITE-HORIZON VARIATIONAL PROBLEM[J].Acta Mathematica Scientia,1998,18(3):278-284.
4尚世亮,庞耀辉.对两道不等式问题证明的研究[J].中学数学研究,2005(1):19-20.
5王建平,王洪林,张敏.积分上限函数在中值问题证明中的应用[J].河北工程技术高等专科学校学报,2013(2):51-54.
6柳合龙,宋新宇.一类半线性椭圆型方程正解的唯一性[J].信阳师范学院学报（自然科学版）,1998,11(2):116-119.
7刘冬兵,马亮亮,陈龙.基于插值法的中值问题证明[J].温州大学学报（自然科学版）,2012,33(5):28-32.
8汪俭彬,黄瑞芳.泰勒公式在解决典型问题方面的应用研究[J].焦作师范高等专科学校学报,2011,27(2):76-77. 被引量：2
9朱其超.用数学归纳法证题需当心这几点[J].数学教学通讯（教师阅读）,2010(3):59-60.
10郑少慧.具有多项式报酬率的连续时间平均马氏决策规划[J].山东矿业学院学报,1989,8(1):84-90.

Journal of Systems Science & Complexity

2014年第5期

浏览历史

内容加载中请稍等...

STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES 被引量：1

参考文献29

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史