This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival...This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time.For this purpose,this problem is formulated as an infinite-horizon Constrained Markov Decision Process(CMDP).To handle the constrained optimization problem,we first adopt the Lagrangian relaxation technique to solve it.Then,we propose a variant of Q-learning,Q-greedyUCB that combinesε-greedy and Upper Confidence Bound(UCB)algorithms to solve this constrained MDP problem.We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution.Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy,and is more efficient than Q-learning withε-greedy,R-learning and the Averagepayoff RL(ARL)algorithm in terms of the cumulative regret.We also show that our algorithm can learn and adapt to the changes of the environment,so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.展开更多
In this paper,we consider the mixed optimal control of a linear stochastic system with a quadratic cost functional,with two controllers—one can choose only deterministic time functions,called the deterministic contro...In this paper,we consider the mixed optimal control of a linear stochastic system with a quadratic cost functional,with two controllers—one can choose only deterministic time functions,called the deterministic controller,while the other can choose adapted random processes,called the random controller.The optimal control is shown to exist under suitable assumptions.The optimal control is characterized via a system of fully coupled forward-backward stochastic differential equations(FBSDEs)of mean-field type.We solve the FBSDEs via solutions of two(but decoupled)Riccati equations,and give the respective optimal feedback law for both deterministic and random controllers,using solutions of both Riccati equations.The optimal state satisfies a linear stochastic differential equation(SDE)of mean-field type.Both the singular and infinite time-horizonal cases are also addressed.展开更多
The infinite-horizon linear quadratic regulation (LQR) problem is settled for discretetime systems with input delay. With the help of an autoregressive moving average (ARMA) innovation model, solutions to the unde...The infinite-horizon linear quadratic regulation (LQR) problem is settled for discretetime systems with input delay. With the help of an autoregressive moving average (ARMA) innovation model, solutions to the underlying problem are obtained. The design of the optimal control law involves in resolving one polynomial equation and one spectral factorization. The latter is the major obstacle of the present problem, and the reorganized innovation approach is used to clear it up. The calculation of spectral factorization finally comes down to solving two Riccati equations with the same dimension as the original systems.展开更多
基金This work was supported by the research fund of Hanyang University(HY-2019-N)This work was supported by the National Key Research&Development Program 2018YFA0701601.
文摘This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time.For this purpose,this problem is formulated as an infinite-horizon Constrained Markov Decision Process(CMDP).To handle the constrained optimization problem,we first adopt the Lagrangian relaxation technique to solve it.Then,we propose a variant of Q-learning,Q-greedyUCB that combinesε-greedy and Upper Confidence Bound(UCB)algorithms to solve this constrained MDP problem.We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution.Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy,and is more efficient than Q-learning withε-greedy,R-learning and the Averagepayoff RL(ARL)algorithm in terms of the cumulative regret.We also show that our algorithm can learn and adapt to the changes of the environment,so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.
基金Lebesgue center of mathematics“Investissements d’avenir”program-ANR-11-LABX-0020-01,by CAESARS-ANR-15-CE05-0024MFG-ANR-16-CE40-0015-01.Tang acknowledges research supported by National Science Foundation of China(Grant No.11631004)Science and Technology Commission of Shanghai Municipality(Grant No.14XD1400400).
文摘In this paper,we consider the mixed optimal control of a linear stochastic system with a quadratic cost functional,with two controllers—one can choose only deterministic time functions,called the deterministic controller,while the other can choose adapted random processes,called the random controller.The optimal control is shown to exist under suitable assumptions.The optimal control is characterized via a system of fully coupled forward-backward stochastic differential equations(FBSDEs)of mean-field type.We solve the FBSDEs via solutions of two(but decoupled)Riccati equations,and give the respective optimal feedback law for both deterministic and random controllers,using solutions of both Riccati equations.The optimal state satisfies a linear stochastic differential equation(SDE)of mean-field type.Both the singular and infinite time-horizonal cases are also addressed.
基金the National Natural Science Foundation of China under Grant No.60574016
文摘The infinite-horizon linear quadratic regulation (LQR) problem is settled for discretetime systems with input delay. With the help of an autoregressive moving average (ARMA) innovation model, solutions to the underlying problem are obtained. The design of the optimal control law involves in resolving one polynomial equation and one spectral factorization. The latter is the major obstacle of the present problem, and the reorganized innovation approach is used to clear it up. The calculation of spectral factorization finally comes down to solving two Riccati equations with the same dimension as the original systems.