In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of app...In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of approximate dynamic programming (ADP), DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm.展开更多
Owing to extensive applications in many fields,the synchronization problem has been widely investigated in multi-agent systems.The synchronization for multi-agent systems is a pivotal issue,which means that under the ...Owing to extensive applications in many fields,the synchronization problem has been widely investigated in multi-agent systems.The synchronization for multi-agent systems is a pivotal issue,which means that under the designed control policy,the output of systems or the state of each agent can be consistent with the leader.The purpose of this paper is to investigate a heuristic dynamic programming(HDP)-based learning tracking control for discrete-time multi-agent systems to achieve synchronization while considering disturbances in systems.Besides,due to the difficulty of solving the coupled Hamilton–Jacobi–Bellman equation analytically,an improved HDP learning control algorithm is proposed to realize the synchronization between the leader and all following agents,which is executed by an action-critic neural network.The action and critic neural network are utilized to learn the optimal control policy and cost function,respectively,by means of introducing an auxiliary action network.Finally,two numerical examples and a practical application of mobile robots are presented to demonstrate the control performance of the HDP-based learning control algorithm.展开更多
Intelligent confrontation has become a vital technology for future air combats.Confrontation games between a penetrating aircraft and an intercepting aircraft are essential for modern air combats.In addition,the perfo...Intelligent confrontation has become a vital technology for future air combats.Confrontation games between a penetrating aircraft and an intercepting aircraft are essential for modern air combats.In addition,the perfor-mance indexes of both the interceptor and penetrator must be considered.Traditional methods only solve one side’s guidance problem without considering the intelligence of the opponent.In this paper,an adaptive heuristic dynamic programming-based algorithm is proposed for aircraft confrontation games.This algorithm constructs a heuristic dynamic programming model for both confrontation aircraft and then updates the critical and ac-tion network parameters using the dynamic confrontation state information.Numerical simulations indicate that the proposed algorithm can optimize the guidance law for both the interceptor and penetrator and is therefore superior to traditional proportional navigation methods.展开更多
At the first sight it seems that advanced operation research is not used enough in continuous production systems as comparison with mass production, batch production and job shop systems, but really in a comprehensive...At the first sight it seems that advanced operation research is not used enough in continuous production systems as comparison with mass production, batch production and job shop systems, but really in a comprehensive evaluation the advanced operation research techniques can be used in continuous production systems in developing countries very widely, because of initial inadequate plant layout, stage by stage development of production lines, the purchase of second hand machineries from various countries, plurality of customers. A case of production system planning is proposed for a chemical company in which the above mentioned conditions are almost presented. The goals and constraints in this issue are as follows: (1) Minimizing deviation of customer's requirements. (2) Maximizing the profit. (3) Minimizing the frequencies of changes in formula production. (4) Minimizing the inventory of final products. (5) Balancing the production sections with regard to rate in production. (6) Limitation in inventory of raw material. The present situation is in such a way that various techniques such as goal programming, linear programming and dynamic programming can be used. But dynamic production programming issues are divided into two categories, at first one with limitation in production capacity and another with unlimited production capacity. For the first category, a systematic and acceptable solution has not been presented yet. Therefore an innovative method is used to convert the dynamic situation to a zero- one model. At last this issue is changed to a goal programming model with non-linear limitations with the use of GRG algorithm and that's how it is solved.展开更多
The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable ener...The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable energy resources, are combined together as a nonlinear, time-varying, indefinite and complex system, which is difficult to manage or optimize. Many nations have already applied the residential real-time pricing to balance the burden on their grid. In order to enhance electricity efficiency of the residential micro grid, this paper presents an action dependent heuristic dynamic programming(ADHDP) method to solve the residential energy scheduling problem. The highlights of this paper are listed below. First,the weather-type classification is adopted to establish three types of programming models based on the features of the solar energy. In addition, the priorities of different energy resources are set to reduce the loss of electrical energy transmissions.Second, three ADHDP-based neural networks, which can update themselves during applications, are designed to manage the flows of electricity. Third, simulation results show that the proposed scheduling method has effectively reduced the total electricity cost and improved load balancing process. The comparison with the particle swarm optimization algorithm further proves that the present method has a promising effect on energy management to save cost.展开更多
This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge t...This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.展开更多
We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee...We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee subjects.Specifically,our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile.This is a significant advance from our previous RL based automatic tuning of prosthesis control parameters which have centered on regulation control with a designer prescribed robotic knee profile as the target.In addition to presenting the tracking control algorithm based on direct heuristic dynamic programming(dHDP),we provide a control performance guarantee including the case of constrained inputs.We show that our proposed tracking control possesses several important properties,such as weight convergence of the learning networks,Bellman(sub)optimality of the cost-to-go value function and control input,and practical stability of the human-robot system.We further provide a systematic simulation of the proposed tracking control using a realistic human-robot system simulator,the OpenSim,to emulate how the dHDP enables level ground walking,walking on different terrains and at different paces.These results show that our proposed dHDP based tracking control is not only theoretically suitable,but also practically useful.展开更多
基金supported by the National Natural Science Foundation of China under Cooperative Research Funds(No.50828701)the third author is also supported by the U.S.Natural Science Foundation(No.ECCS-0702057)
文摘In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of approximate dynamic programming (ADP), DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm.
基金This work was supported by Tianjin Natural Science Foundation under Grant 20JCYBJC00880Beijing key Laboratory Open Fund of Long-Life Technology of Precise Rotation and Transmission MechanismsGuangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control.
文摘Owing to extensive applications in many fields,the synchronization problem has been widely investigated in multi-agent systems.The synchronization for multi-agent systems is a pivotal issue,which means that under the designed control policy,the output of systems or the state of each agent can be consistent with the leader.The purpose of this paper is to investigate a heuristic dynamic programming(HDP)-based learning tracking control for discrete-time multi-agent systems to achieve synchronization while considering disturbances in systems.Besides,due to the difficulty of solving the coupled Hamilton–Jacobi–Bellman equation analytically,an improved HDP learning control algorithm is proposed to realize the synchronization between the leader and all following agents,which is executed by an action-critic neural network.The action and critic neural network are utilized to learn the optimal control policy and cost function,respectively,by means of introducing an auxiliary action network.Finally,two numerical examples and a practical application of mobile robots are presented to demonstrate the control performance of the HDP-based learning control algorithm.
基金supported by the China Postdoctoral Science Founda-tion(Grant No.2020M681750).
文摘Intelligent confrontation has become a vital technology for future air combats.Confrontation games between a penetrating aircraft and an intercepting aircraft are essential for modern air combats.In addition,the perfor-mance indexes of both the interceptor and penetrator must be considered.Traditional methods only solve one side’s guidance problem without considering the intelligence of the opponent.In this paper,an adaptive heuristic dynamic programming-based algorithm is proposed for aircraft confrontation games.This algorithm constructs a heuristic dynamic programming model for both confrontation aircraft and then updates the critical and ac-tion network parameters using the dynamic confrontation state information.Numerical simulations indicate that the proposed algorithm can optimize the guidance law for both the interceptor and penetrator and is therefore superior to traditional proportional navigation methods.
文摘At the first sight it seems that advanced operation research is not used enough in continuous production systems as comparison with mass production, batch production and job shop systems, but really in a comprehensive evaluation the advanced operation research techniques can be used in continuous production systems in developing countries very widely, because of initial inadequate plant layout, stage by stage development of production lines, the purchase of second hand machineries from various countries, plurality of customers. A case of production system planning is proposed for a chemical company in which the above mentioned conditions are almost presented. The goals and constraints in this issue are as follows: (1) Minimizing deviation of customer's requirements. (2) Maximizing the profit. (3) Minimizing the frequencies of changes in formula production. (4) Minimizing the inventory of final products. (5) Balancing the production sections with regard to rate in production. (6) Limitation in inventory of raw material. The present situation is in such a way that various techniques such as goal programming, linear programming and dynamic programming can be used. But dynamic production programming issues are divided into two categories, at first one with limitation in production capacity and another with unlimited production capacity. For the first category, a systematic and acceptable solution has not been presented yet. Therefore an innovative method is used to convert the dynamic situation to a zero- one model. At last this issue is changed to a goal programming model with non-linear limitations with the use of GRG algorithm and that's how it is solved.
基金supported in part by the National Natural Science Foundation of China(61533017,U1501251,61374105,61722312)
文摘The residential energy scheduling of solar energy is an important research area of smart grid. On the demand side, factors such as household loads, storage batteries, the outside public utility grid and renewable energy resources, are combined together as a nonlinear, time-varying, indefinite and complex system, which is difficult to manage or optimize. Many nations have already applied the residential real-time pricing to balance the burden on their grid. In order to enhance electricity efficiency of the residential micro grid, this paper presents an action dependent heuristic dynamic programming(ADHDP) method to solve the residential energy scheduling problem. The highlights of this paper are listed below. First,the weather-type classification is adopted to establish three types of programming models based on the features of the solar energy. In addition, the priorities of different energy resources are set to reduce the loss of electrical energy transmissions.Second, three ADHDP-based neural networks, which can update themselves during applications, are designed to manage the flows of electricity. Third, simulation results show that the proposed scheduling method has effectively reduced the total electricity cost and improved load balancing process. The comparison with the particle swarm optimization algorithm further proves that the present method has a promising effect on energy management to save cost.
基金the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(62222301,61890930-5,62021003)the Beijing Natural Science Foundation(JQ19013).
文摘This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
基金This work was partly supported by the National Science Foundation(1563921,1808752,1563454,1808898).
文摘We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee subjects.Specifically,our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile.This is a significant advance from our previous RL based automatic tuning of prosthesis control parameters which have centered on regulation control with a designer prescribed robotic knee profile as the target.In addition to presenting the tracking control algorithm based on direct heuristic dynamic programming(dHDP),we provide a control performance guarantee including the case of constrained inputs.We show that our proposed tracking control possesses several important properties,such as weight convergence of the learning networks,Bellman(sub)optimality of the cost-to-go value function and control input,and practical stability of the human-robot system.We further provide a systematic simulation of the proposed tracking control using a realistic human-robot system simulator,the OpenSim,to emulate how the dHDP enables level ground walking,walking on different terrains and at different paces.These results show that our proposed dHDP based tracking control is not only theoretically suitable,but also practically useful.