期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling
1
作者 Yu Zhao Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2021年第6期12-23,共12页
This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival... This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time.For this purpose,this problem is formulated as an infinite-horizon Constrained Markov Decision Process(CMDP).To handle the constrained optimization problem,we first adopt the Lagrangian relaxation technique to solve it.Then,we propose a variant of Q-learning,Q-greedyUCB that combinesε-greedy and Upper Confidence Bound(UCB)algorithms to solve this constrained MDP problem.We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution.Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy,and is more efficient than Q-learning withε-greedy,R-learning and the Averagepayoff RL(ARL)algorithm in terms of the cumulative regret.We also show that our algorithm can learn and adapt to the changes of the environment,so as to obtain an optimal scheduling strategy under a given power constraint for the new environment. 展开更多
关键词 reinforcement learning for average rewards infinite-horizon Markov decision process upper confidence bound queue scheduling
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部