Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
In this paper,guaranteed cost attitude tracking con-trol for uncertain quadrotor unmanned aerial vehicle(QUAV)under safety constraints is studied.First,an augmented system is constructed by the tracking error system a...In this paper,guaranteed cost attitude tracking con-trol for uncertain quadrotor unmanned aerial vehicle(QUAV)under safety constraints is studied.First,an augmented system is constructed by the tracking error system and reference system.This transformation aims to convert the tracking control prob-lem into a stabilization control problem.Then,control barrier function and disturbance attenuation function are designed to characterize the violations of safety constraints and tolerance of uncertain disturbances,and they are incorporated into the reward function as penalty items.Based on the modified reward function,the problem is simplified as the optimal regulation problem of the nominal augmented system,and a new Hamilton-Jacobi-Bellman equation is developed.Finally,critic-only rein-forcement learning algorithm with a concurrent learning tech-nique is employed to solve the Hamilton-Jacobi-Bellman equa-tion and obtain the optimal controller.The proposed algorithm can not only ensure the reward function within an upper bound in the presence of uncertain disturbances,but also enforce safety constraints.The performance of the algorithm is evaluated by the numerical simulation.展开更多
In this paper,we present a novel adaptive performance control approach for strict-feedback nonparametric systems with unknown time-varying control coefficients,which mainly includes the following steps.Firstly,by intr...In this paper,we present a novel adaptive performance control approach for strict-feedback nonparametric systems with unknown time-varying control coefficients,which mainly includes the following steps.Firstly,by introducing several key transformation functions and selecting the initial value of the time-varying scaling function,the symmetric prescribed performance with global and semi-global properties can be handled uniformly,without the need for control re-design.Secondly,to handle the problem of unknown time-varying control coefficient with an unknown sign,we propose an enhanced Nussbaum function(ENF)bearing some unique properties and characteristics,with which the complex stability analysis based on specific Nussbaum functions as commonly used is no longer required.Thirdly,by utilizing the core-function information technique,the nonparametric uncertainties in the system are gracefully handled so that no approximator is required.Furthermore,simulation results verify the effectiveness and benefits of the approach.展开更多
In this paper,we consider the practical prescribed-time performance guaranteed tracking control problem for a class of uncertain strict-feedback systems subject to unknown control direction.Due to the existence of unk...In this paper,we consider the practical prescribed-time performance guaranteed tracking control problem for a class of uncertain strict-feedback systems subject to unknown control direction.Due to the existence of unknown nonlinearities and uncertainties,it is challenging to design a controller that can ensure the stability of closed-loop system within a predetermined finite time while maintaining the specified transient performance.The underlying problem becomes further complex as the control directions are unknown.To deal with the above problems,a special translation function as well as Nussbaum type function are introduced in the prescribed performance control(PPC)framework.Finally,a PPC as well as preset finite time tracking control scheme is designed,and its effectiveness is confirmed by both theoretical analysis and numerical simulation.展开更多
This paper presents a learning-based control policy design for point-to-point vehicle positioning in the urban environment via BeiDou navigation.While navigating in urban canyons,the multipath effect is a kind of inte...This paper presents a learning-based control policy design for point-to-point vehicle positioning in the urban environment via BeiDou navigation.While navigating in urban canyons,the multipath effect is a kind of interference that causes the navigation signal to drift and thus imposes severe impacts on vehicle localization due to the reflection and diffraction of the BeiDou signal.Here,the authors formulated the navigation control system with unknown vehicle dynamics into an optimal control-seeking problem through a linear discrete-time system,and the point-to-point localization control is modeled and handled by leveraging off-policy reinforcement learning for feedback control.The proposed learning-based design guarantees optimality with prescribed performance and also stabilizes the closed-loop navigation system,without the full knowledge of the vehicle dynamics.It is seen that the proposed method can withstand the impact of the multipath effect while satisfying the prescribed convergence rate.A case study demonstrates that the proposed algorithms effectively drive the vehicle to a desired setpoint under the multipath effect introduced by actual experiments of BeiDou navigation in the urban environment.展开更多
The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's ...The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.展开更多
This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize t...This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.展开更多
In this paper, we design consensus algorithms for multiple unmanned aerial vehicles (UAV). We mainly focus on the control design in the face of measurement noise and propose a position consensus controller based on ...In this paper, we design consensus algorithms for multiple unmanned aerial vehicles (UAV). We mainly focus on the control design in the face of measurement noise and propose a position consensus controller based on the sliding mode control by using the distributed UAV information. Within the framework of Lyapunov theory, it is shown that all signals in the closed-loop multi- UAV systems are stabilized by the proposed algorithm, while consensus errors are uniformly ultimately bounded. Moreover, for each local UAV, we propose a mechanism to define the trustworthiness, based on which the edge weights are tuned to eliminate negative influence from stubborn agents or agents exposed to extremely noisy measurement. Finally, we develop software for a nano UAV platform, based on which we implement our algorithms to address measurement noises in UAV flight tests. The experimental results validate the effectiveness of the proposed algorithms.展开更多
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi...This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.展开更多
Cooperative control of multi-agent systems linked by communication networks is a well-developed and still growing field. The interplay of the individual agent dynamics and the communication graph topology results in i...Cooperative control of multi-agent systems linked by communication networks is a well-developed and still growing field. The interplay of the individual agent dynamics and the communication graph topology results in intriguing and often surprising behaviors that are not manifested in the study of control systems for single-agent dynamics. This field brings systems theory, feedback control, graph theory, communication systems, complex systems theory to provide rigorous analysis and design for multiple dynamical systems interconnected by a graph information flow structure. Applications have been made to vehicle formation control, coordinated multi-satellite control, electric power system control, robotics, autonomous airborne systems, manufacturing production lines, and the synchronization of dynamical processes in chemistry, physics, biology, and chaotic systems.展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported in part by the National Science Foundation of China(62173183)。
文摘In this paper,guaranteed cost attitude tracking con-trol for uncertain quadrotor unmanned aerial vehicle(QUAV)under safety constraints is studied.First,an augmented system is constructed by the tracking error system and reference system.This transformation aims to convert the tracking control prob-lem into a stabilization control problem.Then,control barrier function and disturbance attenuation function are designed to characterize the violations of safety constraints and tolerance of uncertain disturbances,and they are incorporated into the reward function as penalty items.Based on the modified reward function,the problem is simplified as the optimal regulation problem of the nominal augmented system,and a new Hamilton-Jacobi-Bellman equation is developed.Finally,critic-only rein-forcement learning algorithm with a concurrent learning tech-nique is employed to solve the Hamilton-Jacobi-Bellman equa-tion and obtain the optimal controller.The proposed algorithm can not only ensure the reward function within an upper bound in the presence of uncertain disturbances,but also enforce safety constraints.The performance of the algorithm is evaluated by the numerical simulation.
基金supported in part by the National Key Research and Development Program of China(2021ZD0201300)in part by the National Natural Science Foundation of China(61860206008,61933012)。
文摘In this paper,we present a novel adaptive performance control approach for strict-feedback nonparametric systems with unknown time-varying control coefficients,which mainly includes the following steps.Firstly,by introducing several key transformation functions and selecting the initial value of the time-varying scaling function,the symmetric prescribed performance with global and semi-global properties can be handled uniformly,without the need for control re-design.Secondly,to handle the problem of unknown time-varying control coefficient with an unknown sign,we propose an enhanced Nussbaum function(ENF)bearing some unique properties and characteristics,with which the complex stability analysis based on specific Nussbaum functions as commonly used is no longer required.Thirdly,by utilizing the core-function information technique,the nonparametric uncertainties in the system are gracefully handled so that no approximator is required.Furthermore,simulation results verify the effectiveness and benefits of the approach.
基金supported in part by the National Key Research and Development Program of China under grant(No.2022YFB4701400/4701401)by the National Natural Science Foundation of China under grant(No.61991400,No.61991403,No.62250710167,No.61860206008,No.61933012,No.62273064,No.62203078)+2 种基金in part by the National Key Research and Development Program of China under grant(No.2021ZD0201300)in part by the Innovation Support Program for International Students Returning to China under grant(No.cx2022016)in part by the Chongqing Medical Scientific Research Project under grant(No.2022DBXM001).
文摘In this paper,we consider the practical prescribed-time performance guaranteed tracking control problem for a class of uncertain strict-feedback systems subject to unknown control direction.Due to the existence of unknown nonlinearities and uncertainties,it is challenging to design a controller that can ensure the stability of closed-loop system within a predetermined finite time while maintaining the specified transient performance.The underlying problem becomes further complex as the control directions are unknown.To deal with the above problems,a special translation function as well as Nussbaum type function are introduced in the prescribed performance control(PPC)framework.Finally,a PPC as well as preset finite time tracking control scheme is designed,and its effectiveness is confirmed by both theoretical analysis and numerical simulation.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.62320106008 and 62373114in part by the Collaborative Innovation Center for Transportation Science and Technology of Guangzhou under Grant No.202206010056.
文摘This paper presents a learning-based control policy design for point-to-point vehicle positioning in the urban environment via BeiDou navigation.While navigating in urban canyons,the multipath effect is a kind of interference that causes the navigation signal to drift and thus imposes severe impacts on vehicle localization due to the reflection and diffraction of the BeiDou signal.Here,the authors formulated the navigation control system with unknown vehicle dynamics into an optimal control-seeking problem through a linear discrete-time system,and the point-to-point localization control is modeled and handled by leveraging off-policy reinforcement learning for feedback control.The proposed learning-based design guarantees optimality with prescribed performance and also stabilizes the closed-loop navigation system,without the full knowledge of the vehicle dynamics.It is seen that the proposed method can withstand the impact of the multipath effect while satisfying the prescribed convergence rate.A case study demonstrates that the proposed algorithms effectively drive the vehicle to a desired setpoint under the multipath effect introduced by actual experiments of BeiDou navigation in the urban environment.
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515011936)the National Natural Science Foundation of China(62320106008)
文摘The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.
基金supported by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals Project(No.JF141002)the National Science Foundation(No.ECCS-1405173)+3 种基金the Office of Naval Research(Nos.N000141310562,N000141410718)the U.S. Army Research Office(No.W911NF-11-D-0001)the National Natural Science Foundation of China(No.61120106011)the Project 111 from the Ministry of Education of China(No.B08015)
文摘This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.
基金This work was supported in part by the National Natural Science Foundation of China (No. 61633007, 61703112), in part by the China Postdoctoral Science Foundation (No. 2016M600643) and the special fund (No. 2017T100618), and in part by the Office of Naval Research (No. N00014-17-1-2239, NO0014-18-1-2221 ).
文摘In this paper, we design consensus algorithms for multiple unmanned aerial vehicles (UAV). We mainly focus on the control design in the face of measurement noise and propose a position consensus controller based on the sliding mode control by using the distributed UAV information. Within the framework of Lyapunov theory, it is shown that all signals in the closed-loop multi- UAV systems are stabilized by the proposed algorithm, while consensus errors are uniformly ultimately bounded. Moreover, for each local UAV, we propose a mechanism to define the trustworthiness, based on which the edge weights are tuned to eliminate negative influence from stubborn agents or agents exposed to extremely noisy measurement. Finally, we develop software for a nano UAV platform, based on which we implement our algorithms to address measurement noises in UAV flight tests. The experimental results validate the effectiveness of the proposed algorithms.
文摘This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.
文摘Cooperative control of multi-agent systems linked by communication networks is a well-developed and still growing field. The interplay of the individual agent dynamics and the communication graph topology results in intriguing and often surprising behaviors that are not manifested in the study of control systems for single-agent dynamics. This field brings systems theory, feedback control, graph theory, communication systems, complex systems theory to provide rigorous analysis and design for multiple dynamical systems interconnected by a graph information flow structure. Applications have been made to vehicle formation control, coordinated multi-satellite control, electric power system control, robotics, autonomous airborne systems, manufacturing production lines, and the synchronization of dynamical processes in chemistry, physics, biology, and chaotic systems.