This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-d...This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.展开更多
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analy...This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation.展开更多
This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the eq...This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.展开更多
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable...This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.展开更多
Proposition 5.5.6(ii)in the book Markov Chains and Stochastic Stability(2nd ed,Cambridge Univ.Press,2009)has been used in the proof of a theorem about ergodicity of Markov chains.Unfortunately,an example in this paper...Proposition 5.5.6(ii)in the book Markov Chains and Stochastic Stability(2nd ed,Cambridge Univ.Press,2009)has been used in the proof of a theorem about ergodicity of Markov chains.Unfortunately,an example in this paper shows that this proposition is not always true.Thus,a correction of this proposition is provided.展开更多
基金This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61374067, 41271076).
文摘This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.
基金supported by National Natural Science Foundation of China (Grant Nos. 61374067 and 41271076)
文摘This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation.
基金This work was supported in part by the National Natural Science Foundation of China(Grant Nos.11931018,61773411,11701588,11961005)the Guangdong Basic and Applied Basic Research Foundation(Grant No.2020B1515310021).
文摘This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.
基金supported by the National Natural Science Foundation of China under Grant Nos.10925107 and 60874004
文摘This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.
基金This work was supported in part by the National Natural Science Foundation of China(Grant Nos.11931018,61773411)the National Natural Science Foundation of Guangdong Province,China(Grant No.2021A1515010057)and Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University,China(2020B1212060032).
文摘Proposition 5.5.6(ii)in the book Markov Chains and Stochastic Stability(2nd ed,Cambridge Univ.Press,2009)has been used in the proof of a theorem about ergodicity of Markov chains.Unfortunately,an example in this paper shows that this proposition is not always true.Thus,a correction of this proposition is provided.