Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro...Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.展开更多
基金partly supported by National Key R&D Program of China(No.2018YFA0306701)the Australian Research Council(Nos.DP160101652 and DP180100691)+1 种基金National Natural Science Foundation of China(No.61832015)the Key Research Program of Frontier Sciences,Chinese Academy of Sciences。
文摘Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.