Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
Organizations are adopting the Bring Your Own Device(BYOD)concept to enhance productivity and reduce expenses.However,this trend introduces security challenges,such as unauthorized access.Traditional access control sy...Organizations are adopting the Bring Your Own Device(BYOD)concept to enhance productivity and reduce expenses.However,this trend introduces security challenges,such as unauthorized access.Traditional access control systems,such as Attribute-Based Access Control(ABAC)and Role-Based Access Control(RBAC),are limited in their ability to enforce access decisions due to the variability and dynamism of attributes related to users and resources.This paper proposes a method for enforcing access decisions that is adaptable and dynamic,based on multilayer hybrid deep learning techniques,particularly the Tabular Deep Neural Network Tabular DNN method.This technique transforms all input attributes in an access request into a binary classification(allow or deny)using multiple layers,ensuring accurate and efficient access decision-making.The proposed solution was evaluated using the Kaggle Amazon access control policy dataset and demonstrated its effectiveness by achieving a 94%accuracy rate.Additionally,the proposed solution enhances the implementation of access decisions based on a variety of resource and user attributes while ensuring privacy through indirect communication with the Policy Administration Point(PAP).This solution significantly improves the flexibility of access control systems,making themmore dynamic and adaptable to the evolving needs ofmodern organizations.Furthermore,it offers a scalable approach to manage the complexities associated with the BYOD environment,providing a robust framework for secure and efficient access management.展开更多
This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge t...This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.展开更多
An adaptive learning tracking control scheme is developed for robotic manipulators by a synthesis of adaptive control and learning control approaches. The proposed controller possesses both adaptive and learning prope...An adaptive learning tracking control scheme is developed for robotic manipulators by a synthesis of adaptive control and learning control approaches. The proposed controller possesses both adaptive and learning properties and thereby is able to handle robotic systems with both time-varying periodic uncertainties and time invariant parameters. Theoretical proofs are established to show that proposed controllers ensure asymptotical tracking performance. The effectiveness of the proposed approaches is validated through extensive numerical simulation results.展开更多
The millimeter wave(mm Wave)is a potential solution for high data rate communication due to its availability of large bandwidth.However,it is challenging to perform beam tracking in vehicular mm Wave communication sys...The millimeter wave(mm Wave)is a potential solution for high data rate communication due to its availability of large bandwidth.However,it is challenging to perform beam tracking in vehicular mm Wave communication systems due to high mobility and narrow beams.In this paper,an adaptive beam tracking algorithm is proposed to improve the network throughput performance while reducing the training signal overhead.In particular,based on the mobility prediction at base station(BS),a novel frame structure with dynamic bundled timeslot is designed.Moreover,an actor-critic reinforcement learning based algorithm is proposed to obtain the joint optimization of both beam width and the number of bundled timeslots,which makes the beam tracking adapt to the changing environment.Simulation results demonstrate that,compared with the traditional full scan and Kalman filter based beam tracking algorithms,our proposed algorithm can improve the time-averaged throughput by 11.34%and 24.86%respectively.With the newly designed frame structure,it also outperforms beam tracking with conventional frame structure,especially in scenarios with large range of vehicle speeds.展开更多
A self-adaptive large neighborhood search method for scheduling n jobs on m non-identical parallel machines with mul- tiple time windows is presented. The problems' another feature lies in oversubscription, namely no...A self-adaptive large neighborhood search method for scheduling n jobs on m non-identical parallel machines with mul- tiple time windows is presented. The problems' another feature lies in oversubscription, namely not all jobs can be scheduled within specified scheduling horizons due to the limited machine capacity. The objective is thus to maximize the overall profits of processed jobs while respecting machine constraints. A first-in- first-out heuristic is applied to find an initial solution, and then a large neighborhood search procedure is employed to relax and re- optimize cumbersome solutions. A machine learning mechanism is also introduced to converge on the most efficient neighborhoods for the problem. Extensive computational results are presented based on data from an application involving the daily observation scheduling of a fleet of earth observing satellites. The method rapidly solves most problem instances to optimal or near optimal and shows a robust performance in sensitive analysis.展开更多
In machinery fault diagnosis,labeled data are always difficult or even impossible to obtain.Transfer learning can leverage related fault diagnosis knowledge from fully labeled source domain to enhance the fault diagno...In machinery fault diagnosis,labeled data are always difficult or even impossible to obtain.Transfer learning can leverage related fault diagnosis knowledge from fully labeled source domain to enhance the fault diagnosis performance in sparsely labeled or unlabeled target domain,which has been widely used for cross domain fault diagnosis.However,existing methods focus on either marginal distribution adaptation(MDA)or conditional distribution adaptation(CDA).In practice,marginal and conditional distributions discrepancies both have significant but different influences on the domain divergence.In this paper,a dynamic distribution adaptation based transfer network(DDATN)is proposed for cross domain bearing fault diagnosis.DDATN utilizes the proposed instance-weighted dynamic maximum mean discrepancy(IDMMD)for dynamic distribution adaptation(DDA),which can dynamically estimate the influences of marginal and conditional distribution and adapt target domain with source domain.The experimental evaluation on cross domain bearing fault diagnosis demonstrates that DDATN can outperformance the state-of-the-art cross domain fault diagnosis methods.展开更多
As a new mode and means of smart manufacturing,smart cloud manufacturing(SCM)faces great challenges in massive supply and demand,dynamic resource collaboration and intelligent adaptation.To address the problem,this pa...As a new mode and means of smart manufacturing,smart cloud manufacturing(SCM)faces great challenges in massive supply and demand,dynamic resource collaboration and intelligent adaptation.To address the problem,this paper proposes an SCM-oriented dynamic supply-demand(SD)intelligent adaptation model for massive manufacturing services.In this model,a collaborative network model is established based on the properties of both the supply-demand and their relationships;in addition,an algorithm based on deep graph clustering(DGC)and aligned sampling(AS)is used to divide and conquer the large adaptation domain to solve the problem of the slow computational speed caused by the high complexity of spatiotemporal search in the collaborative network model.At the same time,an intelligent supply-demand adaptation method driven by the quality of service(QoS)is established,in which the experiences of adaptation are shared among adaptation subdomains through deep reinforcement learning(DRL)powered by a transfer mechanism to improve the poor adaptation results caused by dynamic uncertainty.The results show that the model and the solution proposed in this paper can performcollaborative and intelligent supply-demand adaptation for themassive and dynamic resources in SCM through autonomous learning and can effectively performglobal supply-demand matching and optimal resource allocation.展开更多
The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of t...The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.展开更多
Designing advanced design techniques for feedback stabilization and optimization of complex systems is important to the modern control field. In this paper, a near-optimal regulation method for general nonaffine dynam...Designing advanced design techniques for feedback stabilization and optimization of complex systems is important to the modern control field. In this paper, a near-optimal regulation method for general nonaffine dynamics is developed with the help of policy learning. For addressing the nonaffine nonlinearity, a pre-compensator is constructed, so that the augmented system can be formulated as affine-like form. Different cost functions are defined for original and transformed controlled plants and then their relationship is analyzed in detail. Additionally, an adaptive critic algorithm involving stability guarantee is employed to solve the augmented optimal control problem. At last, several case studies are conducted for verifying the stability, robustness, and optimality of a torsional pendulum plant with suitable cost.展开更多
Single gimbal control moment gyroscope(SGCMG)with high precision and fast response is an important attitude control system for high precision docking,rapid maneuvering navigation and guidance system in the aerospace f...Single gimbal control moment gyroscope(SGCMG)with high precision and fast response is an important attitude control system for high precision docking,rapid maneuvering navigation and guidance system in the aerospace field.In this paper,considering the influence of multi-source disturbance,a data-based feedback relearning(FR)algorithm is designed for the robust control of SGCMG gimbal servo system.Based on adaptive dynamic programming and least-square principle,the FR algorithm is used to obtain the servo control strategy by collecting the online operation data of SGCMG system.This is a model-free learning strategy in which no prior knowledge of the SGCMG model is required.Then,combining the reinforcement learning mechanism,the servo control strategy is interacted with system dynamic of SGCMG.The adaptive evaluation and improvement of servo control strategy against the multi-source disturbance are realized.Meanwhile,a data redistribution method based on experience replay is designed to reduce data correlation to improve algorithm stability and data utilization efficiency.Finally,by comparing with other methods on the simulation model of SGCMG,the effectiveness of the proposed servo control strategy is verified.展开更多
Nonlinear loads in the power distribution system cause non-sinusoidal currents and voltages with harmonic components.Shunt active filters(SAF) with current controlled voltage source inverters(CCVSI) are usually used t...Nonlinear loads in the power distribution system cause non-sinusoidal currents and voltages with harmonic components.Shunt active filters(SAF) with current controlled voltage source inverters(CCVSI) are usually used to obtain balanced and sinusoidal source currents by injecting compensation currents.However,CCVSI with traditional controllers have a limited transient and steady state performance.In this paper,we propose an adaptive dynamic programming(ADP) controller with online learning capability to improve transient response and harmonics.The proposed controller works alongside existing proportional integral(PI) controllers to efficiently track the reference currents in the d-q domain.It can generate adaptive control actions to compensate the PI controller.The proposed system was simulated under different nonlinear(three-phase full wave rectifier) load conditions.The performance of the proposed approach was compared with the traditional approach.We have also included the simulation results without connecting the traditional PI control based power inverter for reference comparison.The online learning based ADP controller not only reduced average total harmonic distortion by 18.41%,but also outperformed traditional PI controllers during transients.展开更多
In this study,We propose a compensated distributed adaptive learning algorithm for heterogeneous multi-agent systems with repetitive motion,where the leader's dynamics are unknown,and the controlled system's p...In this study,We propose a compensated distributed adaptive learning algorithm for heterogeneous multi-agent systems with repetitive motion,where the leader's dynamics are unknown,and the controlled system's parameters are uncertain.The multiagent systems are considered a kind of hybrid order nonlinear systems,which relaxes the strict requirement that all agents are of the same order in some existing work.For theoretical analyses,we design a composite energy function with virtual gain parameters to reduce the restriction that the controller gain depends on global information.Considering the stability of the controller,we introduce a smooth continuous function to improve the piecewise controller to avoid possible chattering.Theoretical analyses prove the convergence of the presented algorithm,and simulation experiments verify the effectiveness of the algorithm.展开更多
In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movem...In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movement primitives(DMPs)is introduced to model motion.A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators.The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences.In addition,motions are categorized into different goals and durations.It is worth mentioning that an adaptive neural networks(NNs)control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution,which is beneficial to the improvement of reliability of the skills learning system.The experiment test on the Baxter robot verifies the effectiveness of the proposed method.展开更多
This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic progra...This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.展开更多
针对基于传感器的行为识别任务中识别场景单一且固定的问题,提出一种多场景下基于传感器的行为识别迁移模型,由基于传感器的动态感知算法(dynamic perception algorithm,DPA)和自适应场景的行为识别迁移方法(adaptive scene human recog...针对基于传感器的行为识别任务中识别场景单一且固定的问题,提出一种多场景下基于传感器的行为识别迁移模型,由基于传感器的动态感知算法(dynamic perception algorithm,DPA)和自适应场景的行为识别迁移方法(adaptive scene human recognition,AHR)两部分组成,解决在固定场景下对传感器的依赖性以及在场景转换时识别模型失效的问题。DPA提出两阶段迁移模式,将行为识别阶段和模型迁移阶段同步推进,保证模型在传感器异动发生后仍能持续拥有识别能力。进一步提出AHR场景迁移方法,实现模型在多场景下的行为识别能力。实验验证该模型具有更优的适应性和可扩展性。展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金partly supported by the University of Malaya Impact Oriented Interdisci-plinary Research Grant under Grant IIRG008(A,B,C)-19IISS.
文摘Organizations are adopting the Bring Your Own Device(BYOD)concept to enhance productivity and reduce expenses.However,this trend introduces security challenges,such as unauthorized access.Traditional access control systems,such as Attribute-Based Access Control(ABAC)and Role-Based Access Control(RBAC),are limited in their ability to enforce access decisions due to the variability and dynamism of attributes related to users and resources.This paper proposes a method for enforcing access decisions that is adaptable and dynamic,based on multilayer hybrid deep learning techniques,particularly the Tabular Deep Neural Network Tabular DNN method.This technique transforms all input attributes in an access request into a binary classification(allow or deny)using multiple layers,ensuring accurate and efficient access decision-making.The proposed solution was evaluated using the Kaggle Amazon access control policy dataset and demonstrated its effectiveness by achieving a 94%accuracy rate.Additionally,the proposed solution enhances the implementation of access decisions based on a variety of resource and user attributes while ensuring privacy through indirect communication with the Policy Administration Point(PAP).This solution significantly improves the flexibility of access control systems,making themmore dynamic and adaptable to the evolving needs ofmodern organizations.Furthermore,it offers a scalable approach to manage the complexities associated with the BYOD environment,providing a robust framework for secure and efficient access management.
基金the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(62222301,61890930-5,62021003)the Beijing Natural Science Foundation(JQ19013).
文摘This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
文摘An adaptive learning tracking control scheme is developed for robotic manipulators by a synthesis of adaptive control and learning control approaches. The proposed controller possesses both adaptive and learning properties and thereby is able to handle robotic systems with both time-varying periodic uncertainties and time invariant parameters. Theoretical proofs are established to show that proposed controllers ensure asymptotical tracking performance. The effectiveness of the proposed approaches is validated through extensive numerical simulation results.
基金supported by the National Key R&D Program of China(2020YFB1807204)Beijing Natural Science Foundation(L212003)。
文摘The millimeter wave(mm Wave)is a potential solution for high data rate communication due to its availability of large bandwidth.However,it is challenging to perform beam tracking in vehicular mm Wave communication systems due to high mobility and narrow beams.In this paper,an adaptive beam tracking algorithm is proposed to improve the network throughput performance while reducing the training signal overhead.In particular,based on the mobility prediction at base station(BS),a novel frame structure with dynamic bundled timeslot is designed.Moreover,an actor-critic reinforcement learning based algorithm is proposed to obtain the joint optimization of both beam width and the number of bundled timeslots,which makes the beam tracking adapt to the changing environment.Simulation results demonstrate that,compared with the traditional full scan and Kalman filter based beam tracking algorithms,our proposed algorithm can improve the time-averaged throughput by 11.34%and 24.86%respectively.With the newly designed frame structure,it also outperforms beam tracking with conventional frame structure,especially in scenarios with large range of vehicle speeds.
基金supported by the National Natural Science Foundation of China (7060103570801062)
文摘A self-adaptive large neighborhood search method for scheduling n jobs on m non-identical parallel machines with mul- tiple time windows is presented. The problems' another feature lies in oversubscription, namely not all jobs can be scheduled within specified scheduling horizons due to the limited machine capacity. The objective is thus to maximize the overall profits of processed jobs while respecting machine constraints. A first-in- first-out heuristic is applied to find an initial solution, and then a large neighborhood search procedure is employed to relax and re- optimize cumbersome solutions. A machine learning mechanism is also introduced to converge on the most efficient neighborhoods for the problem. Extensive computational results are presented based on data from an application involving the daily observation scheduling of a fleet of earth observing satellites. The method rapidly solves most problem instances to optimal or near optimal and shows a robust performance in sensitive analysis.
基金Supported by National Natural Science Foundation of China(Grant Nos.51875208,51475170)National Key Research and Development Program of China(Grant No.2018YFB1702400).
文摘In machinery fault diagnosis,labeled data are always difficult or even impossible to obtain.Transfer learning can leverage related fault diagnosis knowledge from fully labeled source domain to enhance the fault diagnosis performance in sparsely labeled or unlabeled target domain,which has been widely used for cross domain fault diagnosis.However,existing methods focus on either marginal distribution adaptation(MDA)or conditional distribution adaptation(CDA).In practice,marginal and conditional distributions discrepancies both have significant but different influences on the domain divergence.In this paper,a dynamic distribution adaptation based transfer network(DDATN)is proposed for cross domain bearing fault diagnosis.DDATN utilizes the proposed instance-weighted dynamic maximum mean discrepancy(IDMMD)for dynamic distribution adaptation(DDA),which can dynamically estimate the influences of marginal and conditional distribution and adapt target domain with source domain.The experimental evaluation on cross domain bearing fault diagnosis demonstrates that DDATN can outperformance the state-of-the-art cross domain fault diagnosis methods.
基金This paper was supported in part by the National Natural Science Foundation of China under Grant 62172235in part by Natural Science Foundation of Jiangsu Province of China under Grant BK20191381in part by Primary Research&Development Plan of Jiangsu Province Grant BE2019742.
文摘As a new mode and means of smart manufacturing,smart cloud manufacturing(SCM)faces great challenges in massive supply and demand,dynamic resource collaboration and intelligent adaptation.To address the problem,this paper proposes an SCM-oriented dynamic supply-demand(SD)intelligent adaptation model for massive manufacturing services.In this model,a collaborative network model is established based on the properties of both the supply-demand and their relationships;in addition,an algorithm based on deep graph clustering(DGC)and aligned sampling(AS)is used to divide and conquer the large adaptation domain to solve the problem of the slow computational speed caused by the high complexity of spatiotemporal search in the collaborative network model.At the same time,an intelligent supply-demand adaptation method driven by the quality of service(QoS)is established,in which the experiences of adaptation are shared among adaptation subdomains through deep reinforcement learning(DRL)powered by a transfer mechanism to improve the poor adaptation results caused by dynamic uncertainty.The results show that the model and the solution proposed in this paper can performcollaborative and intelligent supply-demand adaptation for themassive and dynamic resources in SCM through autonomous learning and can effectively performglobal supply-demand matching and optimal resource allocation.
基金This work was supported in part by Beijing Natural Science Foundation(JQ19013)the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(61773373).
文摘The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.
基金supported in part by the National Natural Science Foundation of China(61773373,U1501251,61533017)in part by the Young Elite Scientists Sponsorship Program by the China Association for Science and Technologyin part by the Youth Innovation Promotion Association of the Chinese Academy of Sciences
文摘Designing advanced design techniques for feedback stabilization and optimization of complex systems is important to the modern control field. In this paper, a near-optimal regulation method for general nonaffine dynamics is developed with the help of policy learning. For addressing the nonaffine nonlinearity, a pre-compensator is constructed, so that the augmented system can be formulated as affine-like form. Different cost functions are defined for original and transformed controlled plants and then their relationship is analyzed in detail. Additionally, an adaptive critic algorithm involving stability guarantee is employed to solve the augmented optimal control problem. At last, several case studies are conducted for verifying the stability, robustness, and optimality of a torsional pendulum plant with suitable cost.
基金This work was supported by the National Natural Science Foundation of China(No.62022061)Tianjin Natural Science Foundation(No.20JCYBJC00880)Beijing Key Laboratory Open Fund of Long-Life Technology of Precise Rotation and Transmission Mechanisms.
文摘Single gimbal control moment gyroscope(SGCMG)with high precision and fast response is an important attitude control system for high precision docking,rapid maneuvering navigation and guidance system in the aerospace field.In this paper,considering the influence of multi-source disturbance,a data-based feedback relearning(FR)algorithm is designed for the robust control of SGCMG gimbal servo system.Based on adaptive dynamic programming and least-square principle,the FR algorithm is used to obtain the servo control strategy by collecting the online operation data of SGCMG system.This is a model-free learning strategy in which no prior knowledge of the SGCMG model is required.Then,combining the reinforcement learning mechanism,the servo control strategy is interacted with system dynamic of SGCMG.The adaptive evaluation and improvement of servo control strategy against the multi-source disturbance are realized.Meanwhile,a data redistribution method based on experience replay is designed to reduce data correlation to improve algorithm stability and data utilization efficiency.Finally,by comparing with other methods on the simulation model of SGCMG,the effectiveness of the proposed servo control strategy is verified.
文摘Nonlinear loads in the power distribution system cause non-sinusoidal currents and voltages with harmonic components.Shunt active filters(SAF) with current controlled voltage source inverters(CCVSI) are usually used to obtain balanced and sinusoidal source currents by injecting compensation currents.However,CCVSI with traditional controllers have a limited transient and steady state performance.In this paper,we propose an adaptive dynamic programming(ADP) controller with online learning capability to improve transient response and harmonics.The proposed controller works alongside existing proportional integral(PI) controllers to efficiently track the reference currents in the d-q domain.It can generate adaptive control actions to compensate the PI controller.The proposed system was simulated under different nonlinear(three-phase full wave rectifier) load conditions.The performance of the proposed approach was compared with the traditional approach.We have also included the simulation results without connecting the traditional PI control based power inverter for reference comparison.The online learning based ADP controller not only reduced average total harmonic distortion by 18.41%,but also outperformed traditional PI controllers during transients.
基金the National Natural Science Foundation of China(Grant Nos.62203342,62073254,92271101,62106186,and 62103136)the Fundamental Research Funds for the Central Universities(Grant Nos.XJS220704,QTZX23003,and ZYTS23046)+1 种基金the Project Funded by China Postdoctoral Science Foundation(Grant No.2022M712489)the Natural Science Basic Research Program of Shaanxi(Grant No.2023-JC-YB-585)。
文摘In this study,We propose a compensated distributed adaptive learning algorithm for heterogeneous multi-agent systems with repetitive motion,where the leader's dynamics are unknown,and the controlled system's parameters are uncertain.The multiagent systems are considered a kind of hybrid order nonlinear systems,which relaxes the strict requirement that all agents are of the same order in some existing work.For theoretical analyses,we design a composite energy function with virtual gain parameters to reduce the restriction that the controller gain depends on global information.Considering the stability of the controller,we introduce a smooth continuous function to improve the piecewise controller to avoid possible chattering.Theoretical analyses prove the convergence of the presented algorithm,and simulation experiments verify the effectiveness of the algorithm.
基金National Natural Science Foundation of China(Nos.62225304,92148204 and 62061160371)National Key Research and Development Program of China(Nos.2021ZD0114503 and 2019YFB1703600)Beijing Top Discipline for Artificial Intelligence Science and Engineering,University of Science and Technology Beijing,and the Beijing Natural Science Foundation(No.JQ20026).
文摘In this article,a robot skills learning framework is developed,which considers both motion modeling and execution.In order to enable the robot to learn skills from demonstrations,a learning method called dynamic movement primitives(DMPs)is introduced to model motion.A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators.The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences.In addition,motions are categorized into different goals and durations.It is worth mentioning that an adaptive neural networks(NNs)control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution,which is beneficial to the improvement of reliability of the skills learning system.The experiment test on the Baxter robot verifies the effectiveness of the proposed method.
文摘This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are illustrated.Adaptive dynamic programming(ADP)is then introduced following a brief discussion of dynamic programming.Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability results.Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives.In particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.
文摘针对基于传感器的行为识别任务中识别场景单一且固定的问题,提出一种多场景下基于传感器的行为识别迁移模型,由基于传感器的动态感知算法(dynamic perception algorithm,DPA)和自适应场景的行为识别迁移方法(adaptive scene human recognition,AHR)两部分组成,解决在固定场景下对传感器的依赖性以及在场景转换时识别模型失效的问题。DPA提出两阶段迁移模式,将行为识别阶段和模型迁移阶段同步推进,保证模型在传感器异动发生后仍能持续拥有识别能力。进一步提出AHR场景迁移方法,实现模型在多场景下的行为识别能力。实验验证该模型具有更优的适应性和可扩展性。