This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-vary...This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing in finite-horizon PI methods are discussed. Then, both data-drive n off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.展开更多
This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the eq...This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.展开更多
基金The work of B. Pang and Z.-P. Jiang has been supported in part by the National Science Foundation (No. ECCS-1501044).
文摘This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing in finite-horizon PI methods are discussed. Then, both data-drive n off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.
基金This work was supported in part by the National Natural Science Foundation of China(Grant Nos.11931018,61773411,11701588,11961005)the Guangdong Basic and Applied Basic Research Foundation(Grant No.2020B1515310021).
文摘This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end.