Decision makers often face the need of performance guarantee with some sufficiently high proba-bility. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probabilitycriterion for ...Decision makers often face the need of performance guarantee with some sufficiently high proba-bility. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probabilitycriterion for the first achieving target value. The objective is to find a policy that maximizes the probabilityof the total discounted reward exceeding a target value in the preceding stages. We show that our formula-tion cannot be described by former models with standard criteria. We provide the properties of the objectivefunctions, optimal value functions and optimal policies. An algorithm for computing the optimal policies forthe finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deter-ministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis,we approximate general models and prove the existence of ε-optimal policy for finite state space. We give anexample for the reliability of the satellite systems using the above theory. Finally, we extend these results tomore general cases.展开更多
基金We thank the referees for their valuable comments and suggestions.This work was supported by the National Natural Science Foundation of China(Grant No.19871046).
文摘Decision makers often face the need of performance guarantee with some sufficiently high proba-bility. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probabilitycriterion for the first achieving target value. The objective is to find a policy that maximizes the probabilityof the total discounted reward exceeding a target value in the preceding stages. We show that our formula-tion cannot be described by former models with standard criteria. We provide the properties of the objectivefunctions, optimal value functions and optimal policies. An algorithm for computing the optimal policies forthe finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deter-ministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis,we approximate general models and prove the existence of ε-optimal policy for finite state space. We give anexample for the reliability of the satellite systems using the above theory. Finally, we extend these results tomore general cases.