This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a f...This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.展开更多
An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their...An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.展开更多
For critical engineering systems such as aircraft and aerospace vehicles, accurate Remaining Useful Life(RUL) prediction not only means cost saving, but more importantly, is of great significance in ensuring system re...For critical engineering systems such as aircraft and aerospace vehicles, accurate Remaining Useful Life(RUL) prediction not only means cost saving, but more importantly, is of great significance in ensuring system reliability and preventing disaster. RUL is affected not only by a system's intrinsic deterioration, but also by the operational conditions under which the system is operating. This paper proposes an RUL prediction approach to estimate the mean RUL of a continuously degrading system under dynamic operational conditions and subjected to condition monitoring at short equi-distant intervals. The dynamic nature of the operational conditions is described by a discrete-time Markov chain, and their influences on the degradation signal are quantified by degradation rates and signal jumps in the degradation model. The uniqueness of our proposed approach is formulating the RUL prediction problem in a semi-Markov decision process framework, by which the system mean RUL can be obtained through the solution to a limited number of equations. To extend the use of our proposed approach in real applications, different failure standards according to different operational conditions are also considered. The application and effectiveness of this approach are illustrated by a turbofan engine dataset and a comparison with existing results for the same dataset.展开更多
This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that ...This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.展开更多
基金Supported by the Natural Science Foundation of China(No.60874004,60736028)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2010)
文摘This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.
文摘An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.
基金the National Natural science Foundation of China (No. 71701008) for supporting this research
文摘For critical engineering systems such as aircraft and aerospace vehicles, accurate Remaining Useful Life(RUL) prediction not only means cost saving, but more importantly, is of great significance in ensuring system reliability and preventing disaster. RUL is affected not only by a system's intrinsic deterioration, but also by the operational conditions under which the system is operating. This paper proposes an RUL prediction approach to estimate the mean RUL of a continuously degrading system under dynamic operational conditions and subjected to condition monitoring at short equi-distant intervals. The dynamic nature of the operational conditions is described by a discrete-time Markov chain, and their influences on the degradation signal are quantified by degradation rates and signal jumps in the degradation model. The uniqueness of our proposed approach is formulating the RUL prediction problem in a semi-Markov decision process framework, by which the system mean RUL can be obtained through the solution to a limited number of equations. To extend the use of our proposed approach in real applications, different failure standards according to different operational conditions are also considered. The application and effectiveness of this approach are illustrated by a turbofan engine dataset and a comparison with existing results for the same dataset.
文摘This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.