摘要
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.
基金
partly supported by National Key R&D Program of China(No.2018YFA0306701)
the Australian Research Council(Nos.DP160101652 and DP180100691)
National Natural Science Foundation of China(No.61832015)
the Key Research Program of Frontier Sciences,Chinese Academy of Sciences。