摘要
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.
基金
supported by the National Natural Science Foundation of China under Grant Nos.61374080 and 61374067
the Natural Science Foundation of Zhejiang Province under Grant No.LY12F03010
the Natural Science Foundation of Ningbo under Grant No.2012A610032
Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions