A stochastic optimal control strategy for partially observable nonlinear quasi Hamiltonian systems is proposed. The optimal control forces consist of two parts. The first part is determined by the conditions under whi...A stochastic optimal control strategy for partially observable nonlinear quasi Hamiltonian systems is proposed. The optimal control forces consist of two parts. The first part is determined by the conditions under which the stochastic optimal control problem of a partially observable nonlinear system is converted into that of a completely observable linear system. The second part is determined by solving the dynamical programming equation derived by applying the stochastic averaging method and stochastic dynamical programming principle to the completely observable linear control system. The response of the optimally controlled quasi Hamiltonian system is predicted by solving the averaged Fokker-Planck-Kolmogorov equation associated with the optimally controlled completely observable linear system and solving the Riccati equation for the estimated error of system states. An example is given to illustrate the procedure and effectiveness of the proposed control strategy.展开更多
A new stochastic optimal control strategy for randomly excited quasi-integrable Hamiltonian systems using magneto-rheological (MR) dampers is proposed. The dynamic be- havior of an MR damper is characterized by the ...A new stochastic optimal control strategy for randomly excited quasi-integrable Hamiltonian systems using magneto-rheological (MR) dampers is proposed. The dynamic be- havior of an MR damper is characterized by the Bouc-Wen hysteretic model. The control force produced by the MR damper is separated into a passive part incorporated in the uncontrolled system and a semi-active part to be determined. The system combining the Bouc-Wen hysteretic force is converted into an equivalent non-hysteretic nonlinear stochastic control system. Then It?o stochastic di?erential equations are derived from the equivalent system by using the stochastic averaging method. A dynamical programming equation for the controlled di?usion processes is established based on the stochastic dynamical programming principle. The non-clipping nonlin- ear optimal control law is obtained for a certain performance index by minimizing the dynamical programming equation. Finally, an example is given to illustrate the application and e?ectiveness of the proposed control strategy.展开更多
In this paper, a new approach using linear combination property of intervals and discretization is proposed to solve a class of nonlinear optimal control problems, containing a nonlinear system and linear functional, ...In this paper, a new approach using linear combination property of intervals and discretization is proposed to solve a class of nonlinear optimal control problems, containing a nonlinear system and linear functional, in three phases. In the first phase, using linear combination property of intervals, changes nonlinear system to an equivalent linear system, in the second phase, using discretization method, the attained problem is converted to a linear programming problem, and in the third phase, the latter problem will be solved by linear programming methods. In addition, efficiency of our approach is confirmed by some numerical examples.展开更多
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is int...This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.展开更多
In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control p...In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.展开更多
In this paper two different control strategies designed to alleviate the response of quasi partially integrable Hamiltonian systems subjected to stochastic excitation are proposed. First, by using the stochastic avera...In this paper two different control strategies designed to alleviate the response of quasi partially integrable Hamiltonian systems subjected to stochastic excitation are proposed. First, by using the stochastic averaging method for quasi partially integrable Hamiltonian systems, an n-DOF controlled quasi partially integrable Hamiltonian system with stochastic excitation is converted into a set of partially averaged It^↑o stochastic differential equations. Then, the dynamical programming equation associated with the partially averaged It^↑o equations is formulated by applying the stochastic dynamical programming principle. In the first control strategy, the optimal control law is derived from the dynamical programming equation and the control constraints without solving the dynamical programming equation. In the second control strategy, the optimal control law is obtained by solving the dynamical programming equation. Finally, both the responses of controlled and uncontrolled systems are predicted through solving the Fokker-Plank-Kolmogorov equation associated with fully averaged It^↑o equations. An example is worked out to illustrate the application and effectiveness of the two proposed control strategies.展开更多
An identification problem is considered as inaccurate measurements of dynamics on a time interval are given. The model has the form of ordinary differential equations which are linear with respect to unknown parameter...An identification problem is considered as inaccurate measurements of dynamics on a time interval are given. The model has the form of ordinary differential equations which are linear with respect to unknown parameters. A new approach is presented to solve the identification problem in the framework of the optimal control theory. A numerical algorithm based on the dynamic programming method is suggested to identify the unknown parameters. Results of simulations are exposed.展开更多
Approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) equations.As...Approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) equations.As interest in ADP and the AC solutions are escalating with time,there is a dire need to consider possible enabling factors for their implementations.A typical AC structure consists of two interacting NNs,which is computationally expensive.In this paper,a new architecture,called the ’cost-function-based single network adaptive critic(J-SNAC)’ is presented,which eliminates one of the networks in a typical AC structure.This approach is applicable to a wide class of nonlinear systems in engineering.In order to demonstrate the benefits and the control synthesis with the J-SNAC,two problems have been solved with the AC and the J-SNAC approaches.Results are presented,which show savings of about 50% of the computational costs by J-SNAC while having the same accuracy levels of the dual network structure in solving for optimal control.Furthermore,convergence of the J-SNAC iterations,which reduces to a least-squares problem,is discussed;for linear systems,the iterative process is shown to reduce to solving the familiar algebraic Ricatti equation.展开更多
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Nat- ural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), and the Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
基金Project supported by the National Natural Science Foundation ofChina (No. 10332030), the Special Fund for Doctor Programs inInstitutions of Higher Learning of China (No. 20020335092), andthe Zhejiang Provincial Natural Science Foundation (No. 101046),China
文摘A stochastic optimal control strategy for partially observable nonlinear quasi Hamiltonian systems is proposed. The optimal control forces consist of two parts. The first part is determined by the conditions under which the stochastic optimal control problem of a partially observable nonlinear system is converted into that of a completely observable linear system. The second part is determined by solving the dynamical programming equation derived by applying the stochastic averaging method and stochastic dynamical programming principle to the completely observable linear control system. The response of the optimally controlled quasi Hamiltonian system is predicted by solving the averaged Fokker-Planck-Kolmogorov equation associated with the optimally controlled completely observable linear system and solving the Riccati equation for the estimated error of system states. An example is given to illustrate the procedure and effectiveness of the proposed control strategy.
基金Project supported by the Zhejiang Provincial Natural Sciences Foundation (No. 101046) and the foundation fromHong Kong RGC (No. PolyU 5051/02E).
文摘A new stochastic optimal control strategy for randomly excited quasi-integrable Hamiltonian systems using magneto-rheological (MR) dampers is proposed. The dynamic be- havior of an MR damper is characterized by the Bouc-Wen hysteretic model. The control force produced by the MR damper is separated into a passive part incorporated in the uncontrolled system and a semi-active part to be determined. The system combining the Bouc-Wen hysteretic force is converted into an equivalent non-hysteretic nonlinear stochastic control system. Then It?o stochastic di?erential equations are derived from the equivalent system by using the stochastic averaging method. A dynamical programming equation for the controlled di?usion processes is established based on the stochastic dynamical programming principle. The non-clipping nonlin- ear optimal control law is obtained for a certain performance index by minimizing the dynamical programming equation. Finally, an example is given to illustrate the application and e?ectiveness of the proposed control strategy.
文摘In this paper, a new approach using linear combination property of intervals and discretization is proposed to solve a class of nonlinear optimal control problems, containing a nonlinear system and linear functional, in three phases. In the first phase, using linear combination property of intervals, changes nonlinear system to an equivalent linear system, in the second phase, using discretization method, the attained problem is converted to a linear programming problem, and in the third phase, the latter problem will be solved by linear programming methods. In addition, efficiency of our approach is confirmed by some numerical examples.
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported in part by the National Key Reseanch and Development Program of China(2018AAA0101502,2018YFB1702300)in part by the National Natural Science Foundation of China(61722312,61533019,U1811463,61533017)in part by the Intel Collaborative Research Institute for Intelligent and Automated Connected Vehicles。
文摘This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems.Unlike existing optimal state feedback control,the control input of the optimal parallel control is introduced into the feedback system.However,due to the introduction of control input into the feedback system,the optimal state feedback control methods can not be applied directly.To address this problem,an augmented system and an augmented performance index function are proposed firstly.Thus,the general nonlinear system is transformed into an affine nonlinear system.The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically.It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function.Moreover,an adaptive dynamic programming(ADP)technique is utilized to implement the optimal parallel tracking control using a critic neural network(NN)to approximate the value function online.The stability analysis of the closed-loop system is performed using the Lyapunov theory,and the tracking error and NN weights errors are uniformly ultimately bounded(UUB).Also,the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals.Finally,the effectiveness of the developed optimal parallel control method is verified in two cases.
文摘In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.
基金The project supported by the National Natural Science Foundation of China (10332030)Research Fund for Doctoral Program of Higher Education of China(20060335125)
文摘In this paper two different control strategies designed to alleviate the response of quasi partially integrable Hamiltonian systems subjected to stochastic excitation are proposed. First, by using the stochastic averaging method for quasi partially integrable Hamiltonian systems, an n-DOF controlled quasi partially integrable Hamiltonian system with stochastic excitation is converted into a set of partially averaged It^↑o stochastic differential equations. Then, the dynamical programming equation associated with the partially averaged It^↑o equations is formulated by applying the stochastic dynamical programming principle. In the first control strategy, the optimal control law is derived from the dynamical programming equation and the control constraints without solving the dynamical programming equation. In the second control strategy, the optimal control law is obtained by solving the dynamical programming equation. Finally, both the responses of controlled and uncontrolled systems are predicted through solving the Fokker-Plank-Kolmogorov equation associated with fully averaged It^↑o equations. An example is worked out to illustrate the application and effectiveness of the two proposed control strategies.
文摘An identification problem is considered as inaccurate measurements of dynamics on a time interval are given. The model has the form of ordinary differential equations which are linear with respect to unknown parameters. A new approach is presented to solve the identification problem in the framework of the optimal control theory. A numerical algorithm based on the dynamic programming method is suggested to identify the unknown parameters. Results of simulations are exposed.
基金supported by the National Aeronautics and Space Administration (NASA) (No.ARMD NRA NNH07ZEA001N-IRAC1)the National Science Foundation (NSF)
文摘Approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) equations.As interest in ADP and the AC solutions are escalating with time,there is a dire need to consider possible enabling factors for their implementations.A typical AC structure consists of two interacting NNs,which is computationally expensive.In this paper,a new architecture,called the ’cost-function-based single network adaptive critic(J-SNAC)’ is presented,which eliminates one of the networks in a typical AC structure.This approach is applicable to a wide class of nonlinear systems in engineering.In order to demonstrate the benefits and the control synthesis with the J-SNAC,two problems have been solved with the AC and the J-SNAC approaches.Results are presented,which show savings of about 50% of the computational costs by J-SNAC while having the same accuracy levels of the dual network structure in solving for optimal control.Furthermore,convergence of the J-SNAC iterations,which reduces to a least-squares problem,is discussed;for linear systems,the iterative process is shown to reduce to solving the familiar algebraic Ricatti equation.