The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of t...The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.展开更多
A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking prob...A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.展开更多
Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iterati...Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iteration(DTTV)algorithm,is developed.The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance.The admissibility of the iterative control law is analyzed.The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution.To implement the algorithm,neural networks are employed and a new implementation structure is established,which avoids solving the generalized Bellman equation in each iteration.Finally,the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm,where the mass and pendulum bar length are permitted to be time-varying parameters.The effectiveness of the developed method is illustrated by numerical results and comparisons.展开更多
In this paper,we present an optimal neuro-control scheme for continuous-time(CT)nonlinear systems with asymmetric input constraints.Initially,we introduce a discounted cost function for the CT nonlinear systems in ord...In this paper,we present an optimal neuro-control scheme for continuous-time(CT)nonlinear systems with asymmetric input constraints.Initially,we introduce a discounted cost function for the CT nonlinear systems in order to handle the asymmetric input constraints.Then,we develop a Hamilton-Jacobi-Bellman equation(HJBE),which arises in the discounted cost optimal control problem.To obtain the optimal neurocontroller,we utilize a critic neural network(CNN)to solve the HJBE under the framework of reinforcement learning.The CNN's weight vector is tuned via the gradient descent approach.Based on the Lyapunov method,we prove that uniform ultimate boundedness of the CNN's weight vector and the closed-loop system is guaranteed.Finally,we verify the effectiveness of the present optimal neuro-control strategy through performing simulations of two examples.展开更多
We develop an online adaptive dynamic programming (ADP) based optimal control scheme for continuous-time chaotic systems. The idea is to use the ADP algorithm to obtain the optimal control input that makes the perfo...We develop an online adaptive dynamic programming (ADP) based optimal control scheme for continuous-time chaotic systems. The idea is to use the ADP algorithm to obtain the optimal control input that makes the performance index function reach an optimum. The expression of the performance index function for the chaotic system is first presented. The online ADP algorithm is presented to achieve optimal control. In the ADP structure, neural networks are used to construct a critic network and an action network, which can obtain an approximate performance index function and the control input, respectively. It is proven that the critic parameter error dynamics and the closed-loop chaotic systems are uniformly ultimately bounded exponentially. Our simulation results illustrate the performance of the established optimal control method.展开更多
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra...This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.展开更多
In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic no...In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming(ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.展开更多
基金This work was supported in part by Beijing Natural Science Foundation(JQ19013)the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(61773373).
文摘The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.
基金supported by the National Natural Science Foundation of China(Grant Nos.61034002,61233001,61273140,61304086,and 61374105)the Beijing Natural Science Foundation,China(Grant No.4132078)
文摘A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.
基金supported in part by Fundamental Research Funds for the Central Universities(2022JBZX024)in part by the National Natural Science Foundation of China(61872037,61273167)。
文摘Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iteration(DTTV)algorithm,is developed.The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance.The admissibility of the iterative control law is analyzed.The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution.To implement the algorithm,neural networks are employed and a new implementation structure is established,which avoids solving the generalized Bellman equation in each iteration.Finally,the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm,where the mass and pendulum bar length are permitted to be time-varying parameters.The effectiveness of the developed method is illustrated by numerical results and comparisons.
基金supported by the National Natural Science Foundation of China(61973228,61973330)
文摘In this paper,we present an optimal neuro-control scheme for continuous-time(CT)nonlinear systems with asymmetric input constraints.Initially,we introduce a discounted cost function for the CT nonlinear systems in order to handle the asymmetric input constraints.Then,we develop a Hamilton-Jacobi-Bellman equation(HJBE),which arises in the discounted cost optimal control problem.To obtain the optimal neurocontroller,we utilize a critic neural network(CNN)to solve the HJBE under the framework of reinforcement learning.The CNN's weight vector is tuned via the gradient descent approach.Based on the Lyapunov method,we prove that uniform ultimate boundedness of the CNN's weight vector and the closed-loop system is guaranteed.Finally,we verify the effectiveness of the present optimal neuro-control strategy through performing simulations of two examples.
基金Project supported by the Open Research Project from the SKLMCCS(Grant No.20120106)the Fundamental Research Funds for the Central Universities of China(Grant No.FRF-TP-13-018A)+2 种基金the Postdoctoral Science Foundation of China(Grant No.2013M530527)the National Natural Science Foundation of China(Grant Nos.61304079 and 61374105)the Natural Science Foundation of Beijing,China(Grant No.4132078 and 4143065)
文摘We develop an online adaptive dynamic programming (ADP) based optimal control scheme for continuous-time chaotic systems. The idea is to use the ADP algorithm to obtain the optimal control input that makes the performance index function reach an optimum. The expression of the performance index function for the chaotic system is first presented. The online ADP algorithm is presented to achieve optimal control. In the ADP structure, neural networks are used to construct a critic network and an action network, which can obtain an approximate performance index function and the control input, respectively. It is proven that the critic parameter error dynamics and the closed-loop chaotic systems are uniformly ultimately bounded exponentially. Our simulation results illustrate the performance of the established optimal control method.
文摘This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.
基金supported in part by National Natural Science Foundation of China(Grant Nos.6137410561233001+1 种基金61273140)in part by Beijing Natural Science Foundation(Grant No.4132078)
文摘In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming(ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.