摘要
This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.
This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.