Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
Aiming at the tracking problem of a class of discrete nonaffine nonlinear multi-input multi-output(MIMO) repetitive systems subjected to separable and nonseparable disturbances, a novel data-driven iterative learning ...Aiming at the tracking problem of a class of discrete nonaffine nonlinear multi-input multi-output(MIMO) repetitive systems subjected to separable and nonseparable disturbances, a novel data-driven iterative learning control(ILC) scheme based on the zeroing neural networks(ZNNs) is proposed. First, the equivalent dynamic linearization data model is obtained by means of dynamic linearization technology, which exists theoretically in the iteration domain. Then, the iterative extended state observer(IESO) is developed to estimate the disturbance and the coupling between systems, and the decoupled dynamic linearization model is obtained for the purpose of controller synthesis. To solve the zero-seeking tracking problem with inherent tolerance of noise,an ILC based on noise-tolerant modified ZNN is proposed. The strict assumptions imposed on the initialization conditions of each iteration in the existing ILC methods can be absolutely removed with our method. In addition, theoretical analysis indicates that the modified ZNN can converge to the exact solution of the zero-seeking tracking problem. Finally, a generalized example and an application-oriented example are presented to verify the effectiveness and superiority of the proposed process.展开更多
In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinfo...In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity.展开更多
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg...This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.展开更多
For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to in...For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.展开更多
Organizations are adopting the Bring Your Own Device(BYOD)concept to enhance productivity and reduce expenses.However,this trend introduces security challenges,such as unauthorized access.Traditional access control sy...Organizations are adopting the Bring Your Own Device(BYOD)concept to enhance productivity and reduce expenses.However,this trend introduces security challenges,such as unauthorized access.Traditional access control systems,such as Attribute-Based Access Control(ABAC)and Role-Based Access Control(RBAC),are limited in their ability to enforce access decisions due to the variability and dynamism of attributes related to users and resources.This paper proposes a method for enforcing access decisions that is adaptable and dynamic,based on multilayer hybrid deep learning techniques,particularly the Tabular Deep Neural Network Tabular DNN method.This technique transforms all input attributes in an access request into a binary classification(allow or deny)using multiple layers,ensuring accurate and efficient access decision-making.The proposed solution was evaluated using the Kaggle Amazon access control policy dataset and demonstrated its effectiveness by achieving a 94%accuracy rate.Additionally,the proposed solution enhances the implementation of access decisions based on a variety of resource and user attributes while ensuring privacy through indirect communication with the Policy Administration Point(PAP).This solution significantly improves the flexibility of access control systems,making themmore dynamic and adaptable to the evolving needs ofmodern organizations.Furthermore,it offers a scalable approach to manage the complexities associated with the BYOD environment,providing a robust framework for secure and efficient access management.展开更多
Reinforcement learning(RL)algorithms are expected to become the next generation of wind farm control methods.However,as wind farms continue to grow in size,the computational complexity of collective wind farm control ...Reinforcement learning(RL)algorithms are expected to become the next generation of wind farm control methods.However,as wind farms continue to grow in size,the computational complexity of collective wind farm control will exponentially increase with the growth of action and state spaces,limiting its potential in practical applications.In this Letter,we employ a RL-based wind farm control approach with multi-agent deep deterministic policy gradient to optimize the yaw manoeuvre of grouped wind turbines in wind farms.To reduce the computational complexity,the turbines in the wind farm are grouped according to the strength of the wake interaction.Meanwhile,to improve the control efficiency,each subgroup is treated as a whole and controlled by a single agent.Optimized results show that the proposed method can not only increase the power production of the wind farm but also significantly improve the control efficiency.展开更多
This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature i...This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems.展开更多
This paper proposes a modified iterative learning control(MILC)periodical feedback-feedforward algorithm to reduce the vibration of a rotor caused by coupled unbalance and parallel misalignment.The control of the vibr...This paper proposes a modified iterative learning control(MILC)periodical feedback-feedforward algorithm to reduce the vibration of a rotor caused by coupled unbalance and parallel misalignment.The control of the vibration of the rotor is provided by an active magnetic actuator(AMA).The iterative gain of the MILC algorithm here presented has a self-adjustment based on the magnitude of the vibration.Notch filters are adopted to extract the synchronous(1×Ω)and twice rotational frequency(2×Ω)components of the rotor vibration.Both the notch frequency of the filter and the size of feedforward storage used during the experiment have a real-time adaptation to the rotational speed.The method proposed in this work can provide effective suppression of the vibration of the rotor in case of sudden changes or fluctuations of the rotor speed.Simulations and experiments using the MILC algorithm proposed here are carried out and give evidence to the feasibility and robustness of the technique proposed.展开更多
In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory...In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria. To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control. On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction. On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle. Such tasks are here combined into a single framework:the deep reinforcement learning output(action) is translated into a set-point to be tracked by the model predictive controller;conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability. The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes;on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions. To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps(links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged. Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main f eatures of the proposed approach. Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors.展开更多
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight...This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.展开更多
In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied sy...In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy.展开更多
Objective To explore the changes in spatial learning performance and long-term potentiation (LTP) which is recognized as a component of the cellular basis of learning and memory in normal and lead-exposed rats after...Objective To explore the changes in spatial learning performance and long-term potentiation (LTP) which is recognized as a component of the cellular basis of learning and memory in normal and lead-exposed rats after administration of melatonin (MT) for two months. Methods Experiment was performed in adult male Wistar rats (12 controls, 12 exposed to melatonin treatment, 10 exposed to lead and 10 exposed to lead and melatonin treatment). The lead-exposed rats received 0.2% lead acetate solution from their birth day while the control rats drank tap water. Melatonin (3 mg/kg) or vehicle was administered to the control and lead-exposed rats from the time of their weaning by gastric gavage each day for 60 days, depending on their groups. At the age of 81-90 days, all the animals were subjected to Morris water maze test and then used for extracellular recording of LTP in the dentate gyrus (DG) area of the hippocampus in vivo. Results Low dose of melatonin given from weaning for two months impaired LTP in the DG area of hippocampus and induced learning and memory deficit in the control rats. When melatonin was administered over a prolonged period to the lead-exposed rats, it exacerbated LTP impairment, learning and memory deficit induced by lead. Conclusion Melatonin is not suitable for normal and lead-exposed children.展开更多
In this paper, coordinated control of multiple robot manipulators holding a rigid object is discussed. In consideration of inaccuracy of the dynamic model of a multiple manipulator system, the error equations on obje...In this paper, coordinated control of multiple robot manipulators holding a rigid object is discussed. In consideration of inaccuracy of the dynamic model of a multiple manipulator system, the error equations on object position and internal force are derived. Then a hybrid position/force coordinated learning control scheme is presented and its convergence is proved. The scheme can improve the system performance by modifying the control input of the system after each iterative learning. Simulation results of two planar robot manipulators holding an object show the effectiveness of this control scheme.展开更多
Static Poisson’s ratio(vs)is crucial for determining geomechanical properties in petroleum applications,namely sand production.Some models have been used to predict vs;however,the published models were limited to spe...Static Poisson’s ratio(vs)is crucial for determining geomechanical properties in petroleum applications,namely sand production.Some models have been used to predict vs;however,the published models were limited to specific data ranges with an average absolute percentage relative error(AAPRE)of more than 10%.The published gated recurrent unit(GRU)models do not consider trend analysis to show physical behaviors.In this study,we aim to develop a GRU model using trend analysis and three inputs for predicting n s based on a broad range of data,n s(value of 0.1627-0.4492),bulk formation density(RHOB)(0.315-2.994 g/mL),compressional time(DTc)(44.43-186.9 μs/ft),and shear time(DTs)(72.9-341.2μ s/ft).The GRU model was evaluated using different approaches,including statistical error an-alyses.The GRU model showed the proper trends,and the model data ranges were wider than previous ones.The GRU model has the largest correlation coefficient(R)of 0.967 and the lowest AAPRE,average percent relative error(APRE),root mean square error(RMSE),and standard deviation(SD)of 3.228%,1.054%,4.389,and 0.013,respectively,compared to other models.The GRU model has a high accuracy for the different datasets:training,validation,testing,and the whole datasets with R and AAPRE values were 0.981 and 2.601%,0.966 and 3.274%,0.967 and 3.228%,and 0.977 and 2.861%,respectively.The group error analyses of all inputs show that the GRU model has less than 5% AAPRE for all input ranges,which is superior to other models that have different AAPRE values of more than 10% at various ranges of inputs.展开更多
In this paper, a learning control approach is applied to the generalized projective synchronisation (GPS) of different chaotic systems with unknown periodically time-varying parameters. Using the Lyapunov--Krasovski...In this paper, a learning control approach is applied to the generalized projective synchronisation (GPS) of different chaotic systems with unknown periodically time-varying parameters. Using the Lyapunov--Krasovskii functional stability theory, a differential-difference mixed parametric learning law and an adaptive learning control law are constructed to make the states of two different chaotic systems asymptotically synchronised. The scheme is successfully applied to the generalized projective synchronisation between the Lorenz system and Chen system. Moreover, numerical simulations results are used to verify the effectiveness of the proposed scheme.展开更多
In this paper, the stability of iterative learning control with data dropouts is discussed. By the super vector formulation, an iterative learning control (ILC) system with data dropouts can be modeled as an asynchr...In this paper, the stability of iterative learning control with data dropouts is discussed. By the super vector formulation, an iterative learning control (ILC) system with data dropouts can be modeled as an asynchronous dynamical system with rate constraints on events in the iteration domain. The stability condition is provided in the form of linear matrix inequalities (LMIS) depending on the stability of asynchronous dynamical systems. The analysis is supported by simulations.展开更多
The PD-type iterative learning control design of a class of affine nonlinear time-delay systems with external disturbances is considered. Sufficient conditions guaranteeing the convergence of the n-norm of the trackin...The PD-type iterative learning control design of a class of affine nonlinear time-delay systems with external disturbances is considered. Sufficient conditions guaranteeing the convergence of the n-norm of the tracking error are derived. It is shown that the system outputs can be guaranteed to converge to desired trajectories in the absence of external disturbances and output measurement noises. And in the presence of state disturbances and measurement noises, the tracking error will be bounded uniformly. A numerical simulation example is presented to validate the effectiveness of the proposed scheme.展开更多
A new kind of volume control hydraulic press that combines the advantages of both hydraulic and SRM(switched reluctance motor) driving technology is developed.Considering that the serious dead zone and time-variant no...A new kind of volume control hydraulic press that combines the advantages of both hydraulic and SRM(switched reluctance motor) driving technology is developed.Considering that the serious dead zone and time-variant nonlinearity exist in the volume control electro-hydraulic servo system,the ILC(iterative learning control) method is applied to tracking the displacement curve of the hydraulic press slider.In order to improve the convergence speed and precision of ILC,a fuzzy ILC algorithm that utilizes the fuzzy strategy to adaptively adjust the iterative learning gains is put forward.The simulation and experimental researches are carried out to investigate the convergence speed and precision of the fuzzy ILC for hydraulic press slider position tracking.The results show that the fuzzy ILC can raise the iterative learning speed enormously,and realize the tracking control of slider displacement curve with rapid response speed and high control precision.In experiment,the maximum tracking error 0.02 V is achieved through 12 iterations only.展开更多
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
基金supported by the National Natural Science Foundation of China(U21A20166)in part by the Science and Technology Development Foundation of Jilin Province (20230508095RC)+1 种基金in part by the Development and Reform Commission Foundation of Jilin Province (2023C034-3)in part by the Exploration Foundation of State Key Laboratory of Automotive Simulation and Control。
文摘Aiming at the tracking problem of a class of discrete nonaffine nonlinear multi-input multi-output(MIMO) repetitive systems subjected to separable and nonseparable disturbances, a novel data-driven iterative learning control(ILC) scheme based on the zeroing neural networks(ZNNs) is proposed. First, the equivalent dynamic linearization data model is obtained by means of dynamic linearization technology, which exists theoretically in the iteration domain. Then, the iterative extended state observer(IESO) is developed to estimate the disturbance and the coupling between systems, and the decoupled dynamic linearization model is obtained for the purpose of controller synthesis. To solve the zero-seeking tracking problem with inherent tolerance of noise,an ILC based on noise-tolerant modified ZNN is proposed. The strict assumptions imposed on the initialization conditions of each iteration in the existing ILC methods can be absolutely removed with our method. In addition, theoretical analysis indicates that the modified ZNN can converge to the exact solution of the zero-seeking tracking problem. Finally, a generalized example and an application-oriented example are presented to verify the effectiveness and superiority of the proposed process.
基金supported by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012015supported in part by the National Natural Science Foundation of China under Grant 62201336+4 种基金in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515011541supported in part by the National Natural Science Foundation of China under Grant 62371344in part by the Fundamental Research Funds for the Central Universitiessupported in part by Knowledge Innovation Program of Wuhan-Shuguang Project under Grant 2023010201020316in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515010247。
文摘In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity.
基金supported in part by the National Key R&D Program of China under Grant 2021YFB2011300the National Natural Science Foundation of China under Grant 52075262。
文摘This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.
基金supported by the National Natural Science Foundation of China (62173333, 12271522)Beijing Natural Science Foundation (Z210002)the Research Fund of Renmin University of China (2021030187)。
文摘For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.
基金partly supported by the University of Malaya Impact Oriented Interdisci-plinary Research Grant under Grant IIRG008(A,B,C)-19IISS.
文摘Organizations are adopting the Bring Your Own Device(BYOD)concept to enhance productivity and reduce expenses.However,this trend introduces security challenges,such as unauthorized access.Traditional access control systems,such as Attribute-Based Access Control(ABAC)and Role-Based Access Control(RBAC),are limited in their ability to enforce access decisions due to the variability and dynamism of attributes related to users and resources.This paper proposes a method for enforcing access decisions that is adaptable and dynamic,based on multilayer hybrid deep learning techniques,particularly the Tabular Deep Neural Network Tabular DNN method.This technique transforms all input attributes in an access request into a binary classification(allow or deny)using multiple layers,ensuring accurate and efficient access decision-making.The proposed solution was evaluated using the Kaggle Amazon access control policy dataset and demonstrated its effectiveness by achieving a 94%accuracy rate.Additionally,the proposed solution enhances the implementation of access decisions based on a variety of resource and user attributes while ensuring privacy through indirect communication with the Policy Administration Point(PAP).This solution significantly improves the flexibility of access control systems,making themmore dynamic and adaptable to the evolving needs ofmodern organizations.Furthermore,it offers a scalable approach to manage the complexities associated with the BYOD environment,providing a robust framework for secure and efficient access management.
基金supported by the National Natural Science Foundation of China (Grant No.12388101)the Science Challenge Project+1 种基金the Anhui NARI Jiyuan Electric Power Grid Technology Co.Ltd.through the Joint Laboratory of USTC-NARIthe advanced computing resources provided by the Supercomputing Center of the USTC
文摘Reinforcement learning(RL)algorithms are expected to become the next generation of wind farm control methods.However,as wind farms continue to grow in size,the computational complexity of collective wind farm control will exponentially increase with the growth of action and state spaces,limiting its potential in practical applications.In this Letter,we employ a RL-based wind farm control approach with multi-agent deep deterministic policy gradient to optimize the yaw manoeuvre of grouped wind turbines in wind farms.To reduce the computational complexity,the turbines in the wind farm are grouped according to the strength of the wake interaction.Meanwhile,to improve the control efficiency,each subgroup is treated as a whole and controlled by a single agent.Optimized results show that the proposed method can not only increase the power production of the wind farm but also significantly improve the control efficiency.
基金“Regional Innovation Strategy(RIS)”through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(MOE)(2021RIS-002).
文摘This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems.
基金Supported by National Natural Science Foundation of China(Grant Nos.51975037,52375075).
文摘This paper proposes a modified iterative learning control(MILC)periodical feedback-feedforward algorithm to reduce the vibration of a rotor caused by coupled unbalance and parallel misalignment.The control of the vibration of the rotor is provided by an active magnetic actuator(AMA).The iterative gain of the MILC algorithm here presented has a self-adjustment based on the magnitude of the vibration.Notch filters are adopted to extract the synchronous(1×Ω)and twice rotational frequency(2×Ω)components of the rotor vibration.Both the notch frequency of the filter and the size of feedforward storage used during the experiment have a real-time adaptation to the rotational speed.The method proposed in this work can provide effective suppression of the vibration of the rotor in case of sudden changes or fluctuations of the rotor speed.Simulations and experiments using the MILC algorithm proposed here are carried out and give evidence to the feasibility and robustness of the technique proposed.
文摘In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria. To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control. On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction. On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle. Such tasks are here combined into a single framework:the deep reinforcement learning output(action) is translated into a set-point to be tracked by the model predictive controller;conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability. The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes;on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions. To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps(links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged. Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main f eatures of the proposed approach. Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors.
基金supported by the National Science and Technology Major Project (2021ZD0112702)the National Natural Science Foundation (NNSF)of China (62373100,62233003)the Natural Science Foundation of Jiangsu Province of China (BK20202006)。
文摘This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.
基金supported in part by the National Key R&D Program of China under Grants 2021YFE0206100in part by the National Natural Science Foundation of China under Grant 62073321+2 种基金in part by National Defense Basic Scientific Research Program JCKY2019203C029in part by the Science and Technology Development Fund,Macao SAR under Grants FDCT-22-009-MISE,0060/2021/A2 and 0015/2020/AMJin part by the financial support from the National Defense Basic Scientific Research Project(JCKY2020130C025).
文摘In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy.
基金supported by the National Basic Research Program of China(No.2002CB512907)the National Natural Science Foundation of China(No.30630057).
文摘Objective To explore the changes in spatial learning performance and long-term potentiation (LTP) which is recognized as a component of the cellular basis of learning and memory in normal and lead-exposed rats after administration of melatonin (MT) for two months. Methods Experiment was performed in adult male Wistar rats (12 controls, 12 exposed to melatonin treatment, 10 exposed to lead and 10 exposed to lead and melatonin treatment). The lead-exposed rats received 0.2% lead acetate solution from their birth day while the control rats drank tap water. Melatonin (3 mg/kg) or vehicle was administered to the control and lead-exposed rats from the time of their weaning by gastric gavage each day for 60 days, depending on their groups. At the age of 81-90 days, all the animals were subjected to Morris water maze test and then used for extracellular recording of LTP in the dentate gyrus (DG) area of the hippocampus in vivo. Results Low dose of melatonin given from weaning for two months impaired LTP in the DG area of hippocampus and induced learning and memory deficit in the control rats. When melatonin was administered over a prolonged period to the lead-exposed rats, it exacerbated LTP impairment, learning and memory deficit induced by lead. Conclusion Melatonin is not suitable for normal and lead-exposed children.
文摘In this paper, coordinated control of multiple robot manipulators holding a rigid object is discussed. In consideration of inaccuracy of the dynamic model of a multiple manipulator system, the error equations on object position and internal force are derived. Then a hybrid position/force coordinated learning control scheme is presented and its convergence is proved. The scheme can improve the system performance by modifying the control input of the system after each iterative learning. Simulation results of two planar robot manipulators holding an object show the effectiveness of this control scheme.
基金The authors thank the Yayasan Universiti Teknologi PETRONAS(YUTP FRG Grant No.015LC0-428)at Universiti Teknologi PETRO-NAS for supporting this study.
文摘Static Poisson’s ratio(vs)is crucial for determining geomechanical properties in petroleum applications,namely sand production.Some models have been used to predict vs;however,the published models were limited to specific data ranges with an average absolute percentage relative error(AAPRE)of more than 10%.The published gated recurrent unit(GRU)models do not consider trend analysis to show physical behaviors.In this study,we aim to develop a GRU model using trend analysis and three inputs for predicting n s based on a broad range of data,n s(value of 0.1627-0.4492),bulk formation density(RHOB)(0.315-2.994 g/mL),compressional time(DTc)(44.43-186.9 μs/ft),and shear time(DTs)(72.9-341.2μ s/ft).The GRU model was evaluated using different approaches,including statistical error an-alyses.The GRU model showed the proper trends,and the model data ranges were wider than previous ones.The GRU model has the largest correlation coefficient(R)of 0.967 and the lowest AAPRE,average percent relative error(APRE),root mean square error(RMSE),and standard deviation(SD)of 3.228%,1.054%,4.389,and 0.013,respectively,compared to other models.The GRU model has a high accuracy for the different datasets:training,validation,testing,and the whole datasets with R and AAPRE values were 0.981 and 2.601%,0.966 and 3.274%,0.967 and 3.228%,and 0.977 and 2.861%,respectively.The group error analyses of all inputs show that the GRU model has less than 5% AAPRE for all input ranges,which is superior to other models that have different AAPRE values of more than 10% at various ranges of inputs.
基金supported by the National Natural Science Foundation of China (Grant No. 60374015)
文摘In this paper, a learning control approach is applied to the generalized projective synchronisation (GPS) of different chaotic systems with unknown periodically time-varying parameters. Using the Lyapunov--Krasovskii functional stability theory, a differential-difference mixed parametric learning law and an adaptive learning control law are constructed to make the states of two different chaotic systems asymptotically synchronised. The scheme is successfully applied to the generalized projective synchronisation between the Lorenz system and Chen system. Moreover, numerical simulations results are used to verify the effectiveness of the proposed scheme.
基金supported by General Program (No. 60774022)State Key Program (No. 60834001) of National Natural Science Foundation of China
文摘In this paper, the stability of iterative learning control with data dropouts is discussed. By the super vector formulation, an iterative learning control (ILC) system with data dropouts can be modeled as an asynchronous dynamical system with rate constraints on events in the iteration domain. The stability condition is provided in the form of linear matrix inequalities (LMIS) depending on the stability of asynchronous dynamical systems. The analysis is supported by simulations.
基金This project was supported by the National Natural Science Foundation of China (60074001) and the Natural ScienceFoundation of Shandong Province (Y2000G02)
文摘The PD-type iterative learning control design of a class of affine nonlinear time-delay systems with external disturbances is considered. Sufficient conditions guaranteeing the convergence of the n-norm of the tracking error are derived. It is shown that the system outputs can be guaranteed to converge to desired trajectories in the absence of external disturbances and output measurement noises. And in the presence of state disturbances and measurement noises, the tracking error will be bounded uniformly. A numerical simulation example is presented to validate the effectiveness of the proposed scheme.
基金Project(2007AA04Z144) supported by the National High-Tech Research and Development Program of ChinaProject(2007421119) supported by the China Postdoctoral Science Foundation
文摘A new kind of volume control hydraulic press that combines the advantages of both hydraulic and SRM(switched reluctance motor) driving technology is developed.Considering that the serious dead zone and time-variant nonlinearity exist in the volume control electro-hydraulic servo system,the ILC(iterative learning control) method is applied to tracking the displacement curve of the hydraulic press slider.In order to improve the convergence speed and precision of ILC,a fuzzy ILC algorithm that utilizes the fuzzy strategy to adaptively adjust the iterative learning gains is put forward.The simulation and experimental researches are carried out to investigate the convergence speed and precision of the fuzzy ILC for hydraulic press slider position tracking.The results show that the fuzzy ILC can raise the iterative learning speed enormously,and realize the tracking control of slider displacement curve with rapid response speed and high control precision.In experiment,the maximum tracking error 0.02 V is achieved through 12 iterations only.