In this paper we study a bilinear optimal control problem for a diffusive Lotka-Volterra competition model with chemo-repulsion in a bounded domain of ℝ^(ℕ),N=2,3.This model describes the competition of two species in...In this paper we study a bilinear optimal control problem for a diffusive Lotka-Volterra competition model with chemo-repulsion in a bounded domain of ℝ^(ℕ),N=2,3.This model describes the competition of two species in which one of them avoid encounters with rivals through a chemo-repulsion mechanism.We prove the existence and uniqueness of weak-strong solutions,and then we analyze the existence of a global optimal solution for a related bilinear optimal control problem,where the control is acting on the chemical signal.Posteriorly,we derive first-order optimality conditions for local optimal solutions using the Lagrange multipliers theory.Finally,we propose a discrete approximation scheme of the optimality system based on the gradient method,which is validated with some computational experiments.展开更多
This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the sl...This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the slow and fast characteristics among system states,the interconnected SPS is decomposed into the slow time-scale dynamics and the fast timescale dynamics through singular perturbation theory.For the fast time-scale dynamics with interconnections,we devise a decentralized optimal control strategy by selecting appropriate weight matrices in the cost function.For the slow time-scale dynamics with unknown system parameters,an off-policy RL algorithm with convergence guarantee is given to learn the optimal control strategy in terms of measurement data.By combining the slow and fast controllers,we establish the composite decentralized adaptive optimal output regulator,and rigorously analyze the stability and optimality of the closed-loop system.The proposed decomposition design not only bypasses the numerical stiffness but also alleviates the high-dimensionality.The efficacy of the proposed methodology is validated by a load-frequency control application of a two-area power system.展开更多
We present an optimal and robust quantum control method for efficient population transfer in asymmetric double quantum-dot molecules.We derive a long-duration control scheme that allows for highly efficient population...We present an optimal and robust quantum control method for efficient population transfer in asymmetric double quantum-dot molecules.We derive a long-duration control scheme that allows for highly efficient population transfer by accurately controlling the amplitude of a narrow-bandwidth pulse.To overcome fluctuations in control field parameters,we employ a frequency-domain quantum optimal control theory method to optimize the spectral phase of a single pulse with broad bandwidth while preserving the spectral amplitude.It is shown that this spectral-phase-only optimization approach can successfully identify robust and optimal control fields,leading to efficient population transfer to the target state while concurrently suppressing population transfer to undesired states.The method demonstrates resilience to fluctuations in control field parameters,making it a promising approach for reliable and efficient population transfer in practical applications.展开更多
This paper presents a novel sequential inverse optimal control(SIOC)method for discrete-time systems,which calculates the unknown weight vectors of the cost function in real time using the input and output of an optim...This paper presents a novel sequential inverse optimal control(SIOC)method for discrete-time systems,which calculates the unknown weight vectors of the cost function in real time using the input and output of an optimally controlled discrete-time system.The proposed method overcomes the limitations of previous approaches by eliminating the need for the invertible Jacobian assumption.It calculates the possible-solution spaces and their intersections sequentially until the dimension of the intersection space decreases to one.The remaining one-dimensional vector of the possible-solution space’s intersection represents the SIOC solution.The paper presents clear conditions for convergence and addresses the issue of noisy data by clarifying the conditions for the singular values of the matrices that relate to the possible-solution space.The effectiveness of the proposed method is demonstrated through simulation results.展开更多
In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied sy...In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy.展开更多
In this paper, the matrix Riccati equation is considered. There is no general way for solving the matrix Riccati equation despite the many fields to which it applies. While scalar Riccati equation has been studied tho...In this paper, the matrix Riccati equation is considered. There is no general way for solving the matrix Riccati equation despite the many fields to which it applies. While scalar Riccati equation has been studied thoroughly, matrix Riccati equation of which scalar Riccati equations is a particular case, is much less investigated. This article proposes a change of variable that allows to find explicit solution of the Matrix Riccati equation. We then apply this solution to Optimal Control.展开更多
In this paper, we propose the nonconforming virtual element method (NCVEM) discretization for the pointwise control constraint optimal control problem governed by elliptic equations. Based on the NCVEM approximation o...In this paper, we propose the nonconforming virtual element method (NCVEM) discretization for the pointwise control constraint optimal control problem governed by elliptic equations. Based on the NCVEM approximation of state equation and the variational discretization of control variables, we construct a virtual element discrete scheme. For the state, adjoint state and control variable, we obtain the corresponding prior estimate in H<sup>1</sup> and L<sup>2</sup> norms. Finally, some numerical experiments are carried out to support the theoretical results.展开更多
This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objecti...This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and non-convexity, the Euler-Lagrange equation of the proposed problem is a complicated partial differential equation.Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning(RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework.展开更多
Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iterati...Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iteration(DTTV)algorithm,is developed.The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance.The admissibility of the iterative control law is analyzed.The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution.To implement the algorithm,neural networks are employed and a new implementation structure is established,which avoids solving the generalized Bellman equation in each iteration.Finally,the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm,where the mass and pendulum bar length are permitted to be time-varying parameters.The effectiveness of the developed method is illustrated by numerical results and comparisons.展开更多
We present a mathematical and numerical study for a pointwise optimal control problem governed by a variable-coefficient Riesz-fractional diffusion equation.Due to the impact of the variable diffusivity coefficient,ex...We present a mathematical and numerical study for a pointwise optimal control problem governed by a variable-coefficient Riesz-fractional diffusion equation.Due to the impact of the variable diffusivity coefficient,existing regularity results for their constantcoefficient counterparts do not apply,while the bilinear forms of the state(adjoint)equation may lose the coercivity that is critical in error estimates of the finite element method.We reformulate the state equation as an equivalent constant-coefficient fractional diffusion equation with the addition of a variable-coefficient low-order fractional advection term.First order optimality conditions are accordingly derived and the smoothing properties of the solutions are analyzed by,e.g.,interpolation estimates.The weak coercivity of the resulting bilinear forms are proven via the Garding inequality,based on which we prove the optimal-order convergence estimates of the finite element method for the(adjoint)state variable and the control variable.Numerical experiments substantiate the theoretical predictions.展开更多
Safety critical control is often trained in a simulated environment to mitigate risk.Subsequent migration of the biased controller requires further adjustments.In this paper,an experience inference human-behavior lear...Safety critical control is often trained in a simulated environment to mitigate risk.Subsequent migration of the biased controller requires further adjustments.In this paper,an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear systems.The approach is inspired in the complementary properties that exhibits the hippocampus,the neocortex,and the striatum learning systems located in the brain.The hippocampus defines a physics informed reference model of the realworld nonlinear system for experience inference and the neocortex is the adaptive dynamic programming(ADP)or reinforcement learning(RL)algorithm that ensures optimal performance of the reference model.This optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model.Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory.Simulation studies are carried out to verify the approach.展开更多
This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge t...This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.展开更多
On the multilingual online social networks of global information sharing,the wanton spread of rumors has an enormous negative impact on people's lives.Thus,it is essential to explore the rumor-spreading rules in m...On the multilingual online social networks of global information sharing,the wanton spread of rumors has an enormous negative impact on people's lives.Thus,it is essential to explore the rumor-spreading rules in multilingual environment and formulate corresponding control strategies to reduce the harm caused by rumor propagation.In this paper,considering the multilingual environment and intervention mechanism in the rumor-spreading process,an improved ignorants–spreaders-1–spreaders-2–removers(I2SR)rumor-spreading model with time delay and the nonlinear incidence is established in heterogeneous networks.Firstly,based on the mean-field equations corresponding to the model,the basic reproduction number is derived to ensure the existence of rumor-spreading equilibrium.Secondly,by applying Lyapunov stability theory and graph theory,the global stability of rumor-spreading equilibrium is analyzed in detail.In particular,aiming at the lowest control cost,the optimal control scheme is designed to optimize the intervention mechanism,and the optimal control conditions are derived using the Pontryagin's minimum principle.Finally,some illustrative examples are provided to verify the effectiveness of the theoretical results.The results show that optimizing the intervention mechanism can effectively reduce the densities of spreaders-1 and spreaders-2 within the expected time,which provides guiding insights for public opinion managers to control rumors.展开更多
In this paper, we consider the optimal risk sharing problem between two parties in the insurance business: the insurer and the insured. The risk is allocated between the insurer and the insured by setting a deductible...In this paper, we consider the optimal risk sharing problem between two parties in the insurance business: the insurer and the insured. The risk is allocated between the insurer and the insured by setting a deductible and coverage in the insurance contract. We obtain the optimal deductible and coverage by considering the expected product of the two parties' utilities of terminal wealth according to stochastic optimal control theory. An equilibrium policy is also derived for when there are both a deductible and coverage;this is done by modelling the problem as a stochastic game in a continuous-time framework. A numerical example is provided to illustrate the results of the paper.展开更多
This paper presents a neighborhood optimal trajectory online correction algorithm considering terminal time variation,and investigates its application range.Firstly,the motion model of midcourse guidance is establishe...This paper presents a neighborhood optimal trajectory online correction algorithm considering terminal time variation,and investigates its application range.Firstly,the motion model of midcourse guidance is established,and the online trajectory correction-regenerating strategy is introduced.Secondly,based on the neighborhood optimal control theory,a neighborhood optimal trajectory online correction algorithm considering the terminal time variation is proposed by adding the consideration of terminal time variation to the traditional neighborhood optimal trajectory correction method.Thirdly,the Monte Carlo simulation method is used to analyze the application range of the algorithm,which provides a basis for the division of application domain of the online correction algorithm and the online regeneration algorithm of midcourse guidance trajectory.Finally,the simulation results show that the algorithm has high real-time performance,and the online correction trajectory can meet the requirements of terminal constraint change.The application range of the algorithm is obtained through Monte Carlo simulation.展开更多
As the largest source of carbon emissions in China,the thermal power industry is the only emission-controlled industry in the first national carbon market compliance cycle.Its conversion to clean-energy generation tec...As the largest source of carbon emissions in China,the thermal power industry is the only emission-controlled industry in the first national carbon market compliance cycle.Its conversion to clean-energy generation technologies is also an important means of reducing CO_(2)emissions and achieving the carbon peak and carbon neutral commitments.This study used fractional Brownian motion to describe the energy-switching cost and constructed a stochastic optimization model on carbon allowance(CA)trading volume and emission-reduction strategy during compliance period with the Hurst exponent and volatility coefficient in the model estimated.We defined the optimal compliance cost of thermal power enterprises as the form of the unique solution of the Hamilton–Jacobi–Bellman equation by combining the dynamic optimization principle and the fractional It?’s formula.In this manner,we obtained the models for optimal emission reduction and equilibrium CA price.Our numerical analysis revealed that,within a compliance period of 2021–2030,the optimal reductions and desired equilibrium prices of CAs changed concurrently,with an increasing trend annually in different peak-year scenarios.Furthermore,sensitivity analysis revealed that the energy price indirectly affected the equilibrium CA price by influencing the Hurst exponent,the depreciation rate positively impacted the CA price,and increasing the initial CA reduced the optimal reduction and the CA price.Our findings can be used to develop optimal emission-reduction strategies for thermal power enterprises and carbon pricing in the carbon market.展开更多
In this paper, we discuss virtual element method (VEM) approximation of optimal control problem governed by Brinkman equations with control constraints. Based on the polynomial projections and variational discretizati...In this paper, we discuss virtual element method (VEM) approximation of optimal control problem governed by Brinkman equations with control constraints. Based on the polynomial projections and variational discretization of the control variable, we build up the virtual element discrete scheme of the optimal control problem and derive the discrete first order optimality system. A priori error estimates for the state, adjoint state and control variables in L<sup>2</sup> and H<sup>1</sup> norm are derived. The theoretical findings are illustrated by the numerical experiments.展开更多
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
Upon infecting a host cell,the reticulate body(RB)form of the Chlamydia bacteria simply proliferates by binary fission for an extended period.Available data show only RB units in the infected cells 20 hours post infec...Upon infecting a host cell,the reticulate body(RB)form of the Chlamydia bacteria simply proliferates by binary fission for an extended period.Available data show only RB units in the infected cells 20 hours post infection(hpi),spanning nearly half way through the development cycle.With data collected every 4 hpi,conversion to the elementary body(EB)form begins abruptly at a rapid rate sometime around 24 hpi.By modeling proliferation and conversion as simple birth and death processes,it has been shown that the optimal strategy for maximizing the total(mean)EB population at host cell lysis time is a bang-bang control qualitatively replicating the observed conversion activities.However,the simple birth and death model for the RB proliferation and conversion to EB deviates in a significant way from the available data on the evolution of the RB population after the onset of RB-to-EB conversion.By working with a more refined model that takes into account a small size threshold eligibility requirement for conversion noted in the available data,we succeed in removing the deficiency of the previous models on the evolution of the RB population without affecting the optimal bang-bang conversion strategy.展开更多
The combination of structural health monitoring and vibration control is of great importance to provide components of smart structures.While synthetic algorithms have been proposed,adaptive control that is compatible ...The combination of structural health monitoring and vibration control is of great importance to provide components of smart structures.While synthetic algorithms have been proposed,adaptive control that is compatible with changing conditions still needs to be used,and time-varying systems are required to be simultaneously estimated with the application of adaptive control.In this research,the identification of structural time-varying dynamic characteristics and optimized simple adaptive control are integrated.First,reduced variations of physical parameters are estimated online using the multiple forgetting factor recursive least squares(MFRLS)method.Then,the energy from the structural vibration is simultaneously specified to optimize the control force with the identified parameters to be operational.Optimization is also performed based on the probability density function of the energy under the seismic excitation at any time.Finally,the optimal control force is obtained by the simple adaptive control(SAC)algorithm and energy coefficient.A numerical example and benchmark structure are employed to investigate the efficiency of the proposed approach.The simulation results revealed the effectiveness of the integrated online identification and optimal adaptive control in systems.展开更多
基金supported by Vicerrectoría de Investigación y Extensión of Universidad Industrial de Santander,Colombia,project 3704.
文摘In this paper we study a bilinear optimal control problem for a diffusive Lotka-Volterra competition model with chemo-repulsion in a bounded domain of ℝ^(ℕ),N=2,3.This model describes the competition of two species in which one of them avoid encounters with rivals through a chemo-repulsion mechanism.We prove the existence and uniqueness of weak-strong solutions,and then we analyze the existence of a global optimal solution for a related bilinear optimal control problem,where the control is acting on the chemical signal.Posteriorly,we derive first-order optimality conditions for local optimal solutions using the Lagrange multipliers theory.Finally,we propose a discrete approximation scheme of the optimality system based on the gradient method,which is validated with some computational experiments.
基金supported by the National Natural Science Foundation of China (62073327,62273350)the Natural Science Foundation of Jiangsu Province (BK20221112)。
文摘This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the slow and fast characteristics among system states,the interconnected SPS is decomposed into the slow time-scale dynamics and the fast timescale dynamics through singular perturbation theory.For the fast time-scale dynamics with interconnections,we devise a decentralized optimal control strategy by selecting appropriate weight matrices in the cost function.For the slow time-scale dynamics with unknown system parameters,an off-policy RL algorithm with convergence guarantee is given to learn the optimal control strategy in terms of measurement data.By combining the slow and fast controllers,we establish the composite decentralized adaptive optimal output regulator,and rigorously analyze the stability and optimality of the closed-loop system.The proposed decomposition design not only bypasses the numerical stiffness but also alleviates the high-dimensionality.The efficacy of the proposed methodology is validated by a load-frequency control application of a two-area power system.
基金This work was supported by the National Natural Science Foundations of China(Grant Nos.12275033,61973317,and 12274470)the Natural Science Foundation of Hunan Province for Distinguished Young Scholars(Grant No.2022JJ10070)+1 种基金the Natural Science Foundation of Hunan Province(Grant No.2022JJ30582)the Scientific Research Fund of Hunan Provincial Education Department(Grant No.20A025).
文摘We present an optimal and robust quantum control method for efficient population transfer in asymmetric double quantum-dot molecules.We derive a long-duration control scheme that allows for highly efficient population transfer by accurately controlling the amplitude of a narrow-bandwidth pulse.To overcome fluctuations in control field parameters,we employ a frequency-domain quantum optimal control theory method to optimize the spectral phase of a single pulse with broad bandwidth while preserving the spectral amplitude.It is shown that this spectral-phase-only optimization approach can successfully identify robust and optimal control fields,leading to efficient population transfer to the target state while concurrently suppressing population transfer to undesired states.The method demonstrates resilience to fluctuations in control field parameters,making it a promising approach for reliable and efficient population transfer in practical applications.
文摘This paper presents a novel sequential inverse optimal control(SIOC)method for discrete-time systems,which calculates the unknown weight vectors of the cost function in real time using the input and output of an optimally controlled discrete-time system.The proposed method overcomes the limitations of previous approaches by eliminating the need for the invertible Jacobian assumption.It calculates the possible-solution spaces and their intersections sequentially until the dimension of the intersection space decreases to one.The remaining one-dimensional vector of the possible-solution space’s intersection represents the SIOC solution.The paper presents clear conditions for convergence and addresses the issue of noisy data by clarifying the conditions for the singular values of the matrices that relate to the possible-solution space.The effectiveness of the proposed method is demonstrated through simulation results.
基金supported in part by the National Key R&D Program of China under Grants 2021YFE0206100in part by the National Natural Science Foundation of China under Grant 62073321+2 种基金in part by National Defense Basic Scientific Research Program JCKY2019203C029in part by the Science and Technology Development Fund,Macao SAR under Grants FDCT-22-009-MISE,0060/2021/A2 and 0015/2020/AMJin part by the financial support from the National Defense Basic Scientific Research Project(JCKY2020130C025).
文摘In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy.
文摘In this paper, the matrix Riccati equation is considered. There is no general way for solving the matrix Riccati equation despite the many fields to which it applies. While scalar Riccati equation has been studied thoroughly, matrix Riccati equation of which scalar Riccati equations is a particular case, is much less investigated. This article proposes a change of variable that allows to find explicit solution of the Matrix Riccati equation. We then apply this solution to Optimal Control.
文摘In this paper, we propose the nonconforming virtual element method (NCVEM) discretization for the pointwise control constraint optimal control problem governed by elliptic equations. Based on the NCVEM approximation of state equation and the variational discretization of control variables, we construct a virtual element discrete scheme. For the state, adjoint state and control variable, we obtain the corresponding prior estimate in H<sup>1</sup> and L<sup>2</sup> norms. Finally, some numerical experiments are carried out to support the theoretical results.
基金supported in part by the National Natural Science Foundation of China(NSFC)(61773260)the Ministry of Science and Technology (2018YFB130590)。
文摘This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and non-convexity, the Euler-Lagrange equation of the proposed problem is a complicated partial differential equation.Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning(RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework.
基金supported in part by Fundamental Research Funds for the Central Universities(2022JBZX024)in part by the National Natural Science Foundation of China(61872037,61273167)。
文摘Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iteration(DTTV)algorithm,is developed.The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance.The admissibility of the iterative control law is analyzed.The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution.To implement the algorithm,neural networks are employed and a new implementation structure is established,which avoids solving the generalized Bellman equation in each iteration.Finally,the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm,where the mass and pendulum bar length are permitted to be time-varying parameters.The effectiveness of the developed method is illustrated by numerical results and comparisons.
基金supported by the National Natural Science Foundation of China(11971276,12171287)Natural Science Foundation of Shandong Province(ZR2016JL004)+1 种基金supported by the China Postdoctoral Science Foundation(2021TQ0017,2021M700244)International Postdoctoral Exchange Fellowship Program(Talent-Introduction Program)(YJ20210019)。
文摘We present a mathematical and numerical study for a pointwise optimal control problem governed by a variable-coefficient Riesz-fractional diffusion equation.Due to the impact of the variable diffusivity coefficient,existing regularity results for their constantcoefficient counterparts do not apply,while the bilinear forms of the state(adjoint)equation may lose the coercivity that is critical in error estimates of the finite element method.We reformulate the state equation as an equivalent constant-coefficient fractional diffusion equation with the addition of a variable-coefficient low-order fractional advection term.First order optimality conditions are accordingly derived and the smoothing properties of the solutions are analyzed by,e.g.,interpolation estimates.The weak coercivity of the resulting bilinear forms are proven via the Garding inequality,based on which we prove the optimal-order convergence estimates of the finite element method for the(adjoint)state variable and the control variable.Numerical experiments substantiate the theoretical predictions.
基金supported by the Royal Academy of Engineering and the Office of the Chie Science Adviser for National Security under the UK Intelligence Community Postdoctoral Research Fellowship programme。
文摘Safety critical control is often trained in a simulated environment to mitigate risk.Subsequent migration of the biased controller requires further adjustments.In this paper,an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear systems.The approach is inspired in the complementary properties that exhibits the hippocampus,the neocortex,and the striatum learning systems located in the brain.The hippocampus defines a physics informed reference model of the realworld nonlinear system for experience inference and the neocortex is the adaptive dynamic programming(ADP)or reinforcement learning(RL)algorithm that ensures optimal performance of the reference model.This optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model.Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory.Simulation studies are carried out to verify the approach.
基金the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(62222301,61890930-5,62021003)the Beijing Natural Science Foundation(JQ19013).
文摘This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
基金the National Natural Science Foundation of People’s Republic of China(Grant Nos.U1703262 and 62163035)the Special Project for Local Science and Technology Development Guided by the Central Government(Grant No.ZYYD2022A05)Xinjiang Key Laboratory of Applied Mathematics(Grant No.XJDX1401)。
文摘On the multilingual online social networks of global information sharing,the wanton spread of rumors has an enormous negative impact on people's lives.Thus,it is essential to explore the rumor-spreading rules in multilingual environment and formulate corresponding control strategies to reduce the harm caused by rumor propagation.In this paper,considering the multilingual environment and intervention mechanism in the rumor-spreading process,an improved ignorants–spreaders-1–spreaders-2–removers(I2SR)rumor-spreading model with time delay and the nonlinear incidence is established in heterogeneous networks.Firstly,based on the mean-field equations corresponding to the model,the basic reproduction number is derived to ensure the existence of rumor-spreading equilibrium.Secondly,by applying Lyapunov stability theory and graph theory,the global stability of rumor-spreading equilibrium is analyzed in detail.In particular,aiming at the lowest control cost,the optimal control scheme is designed to optimize the intervention mechanism,and the optimal control conditions are derived using the Pontryagin's minimum principle.Finally,some illustrative examples are provided to verify the effectiveness of the theoretical results.The results show that optimizing the intervention mechanism can effectively reduce the densities of spreaders-1 and spreaders-2 within the expected time,which provides guiding insights for public opinion managers to control rumors.
基金supported by the NSF of China(11931018, 12271274)the Tianjin Natural Science Foundation (19JCYBJC30400)。
文摘In this paper, we consider the optimal risk sharing problem between two parties in the insurance business: the insurer and the insured. The risk is allocated between the insurer and the insured by setting a deductible and coverage in the insurance contract. We obtain the optimal deductible and coverage by considering the expected product of the two parties' utilities of terminal wealth according to stochastic optimal control theory. An equilibrium policy is also derived for when there are both a deductible and coverage;this is done by modelling the problem as a stochastic game in a continuous-time framework. A numerical example is provided to illustrate the results of the paper.
基金supported by the National Natural Science Foundation of China(61873278,62173339)。
文摘This paper presents a neighborhood optimal trajectory online correction algorithm considering terminal time variation,and investigates its application range.Firstly,the motion model of midcourse guidance is established,and the online trajectory correction-regenerating strategy is introduced.Secondly,based on the neighborhood optimal control theory,a neighborhood optimal trajectory online correction algorithm considering the terminal time variation is proposed by adding the consideration of terminal time variation to the traditional neighborhood optimal trajectory correction method.Thirdly,the Monte Carlo simulation method is used to analyze the application range of the algorithm,which provides a basis for the division of application domain of the online correction algorithm and the online regeneration algorithm of midcourse guidance trajectory.Finally,the simulation results show that the algorithm has high real-time performance,and the online correction trajectory can meet the requirements of terminal constraint change.The application range of the algorithm is obtained through Monte Carlo simulation.
基金like to thank Major Program of National Philosophy and Social Science Foundation of China(Grant No.21ZDA086)National Natural Science Foundation of China(Grant No.71974188),and Jiangsu Soft Science Fund(Grant No.BR2022007).
文摘As the largest source of carbon emissions in China,the thermal power industry is the only emission-controlled industry in the first national carbon market compliance cycle.Its conversion to clean-energy generation technologies is also an important means of reducing CO_(2)emissions and achieving the carbon peak and carbon neutral commitments.This study used fractional Brownian motion to describe the energy-switching cost and constructed a stochastic optimization model on carbon allowance(CA)trading volume and emission-reduction strategy during compliance period with the Hurst exponent and volatility coefficient in the model estimated.We defined the optimal compliance cost of thermal power enterprises as the form of the unique solution of the Hamilton–Jacobi–Bellman equation by combining the dynamic optimization principle and the fractional It?’s formula.In this manner,we obtained the models for optimal emission reduction and equilibrium CA price.Our numerical analysis revealed that,within a compliance period of 2021–2030,the optimal reductions and desired equilibrium prices of CAs changed concurrently,with an increasing trend annually in different peak-year scenarios.Furthermore,sensitivity analysis revealed that the energy price indirectly affected the equilibrium CA price by influencing the Hurst exponent,the depreciation rate positively impacted the CA price,and increasing the initial CA reduced the optimal reduction and the CA price.Our findings can be used to develop optimal emission-reduction strategies for thermal power enterprises and carbon pricing in the carbon market.
文摘In this paper, we discuss virtual element method (VEM) approximation of optimal control problem governed by Brinkman equations with control constraints. Based on the polynomial projections and variational discretization of the control variable, we build up the virtual element discrete scheme of the optimal control problem and derive the discrete first order optimality system. A priori error estimates for the state, adjoint state and control variables in L<sup>2</sup> and H<sup>1</sup> norm are derived. The theoretical findings are illustrated by the numerical experiments.
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
文摘Upon infecting a host cell,the reticulate body(RB)form of the Chlamydia bacteria simply proliferates by binary fission for an extended period.Available data show only RB units in the infected cells 20 hours post infection(hpi),spanning nearly half way through the development cycle.With data collected every 4 hpi,conversion to the elementary body(EB)form begins abruptly at a rapid rate sometime around 24 hpi.By modeling proliferation and conversion as simple birth and death processes,it has been shown that the optimal strategy for maximizing the total(mean)EB population at host cell lysis time is a bang-bang control qualitatively replicating the observed conversion activities.However,the simple birth and death model for the RB proliferation and conversion to EB deviates in a significant way from the available data on the evolution of the RB population after the onset of RB-to-EB conversion.By working with a more refined model that takes into account a small size threshold eligibility requirement for conversion noted in the available data,we succeed in removing the deficiency of the previous models on the evolution of the RB population without affecting the optimal bang-bang conversion strategy.
文摘The combination of structural health monitoring and vibration control is of great importance to provide components of smart structures.While synthetic algorithms have been proposed,adaptive control that is compatible with changing conditions still needs to be used,and time-varying systems are required to be simultaneously estimated with the application of adaptive control.In this research,the identification of structural time-varying dynamic characteristics and optimized simple adaptive control are integrated.First,reduced variations of physical parameters are estimated online using the multiple forgetting factor recursive least squares(MFRLS)method.Then,the energy from the structural vibration is simultaneously specified to optimize the control force with the identified parameters to be operational.Optimization is also performed based on the probability density function of the energy under the seismic excitation at any time.Finally,the optimal control force is obtained by the simple adaptive control(SAC)algorithm and energy coefficient.A numerical example and benchmark structure are employed to investigate the efficiency of the proposed approach.The simulation results revealed the effectiveness of the integrated online identification and optimal adaptive control in systems.