Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under ra...Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under randomized stationary policies. The proposed algo-rithm will estimate the gradient of average cost performance measure with respect to policy param-eters by transforming a continuous time Markov process into a uniform Markov chain and simula-ting a single sample path of the chain. The goal is to look for a suboptimal randomized stationarypolicy. The algorithm derived here can meet the needs of performance optimization of many diffi-cult systems with large-scale state space. Finally, a numerical example for a controlled Markovprocess is provided.展开更多
文摘Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under randomized stationary policies. The proposed algo-rithm will estimate the gradient of average cost performance measure with respect to policy param-eters by transforming a continuous time Markov process into a uniform Markov chain and simula-ting a single sample path of the chain. The goal is to look for a suboptimal randomized stationarypolicy. The algorithm derived here can meet the needs of performance optimization of many diffi-cult systems with large-scale state space. Finally, a numerical example for a controlled Markovprocess is provided.