期刊文献+

事件驱动Q学习在呼叫接入控制中的应用

Application of event driven Q-leaning in call admission control
下载PDF
导出
摘要 文章研究了计时报酬方式下最优呼叫接入控制问题,建立了系统的连续时间Markov决策过程(CT-MDP),根据系统特征引入后状态Q值更新方法,给出呼叫接入控制问题基于事件驱动Q学习优化算法,并给出一个数值仿真实例;仿真结果表明,该算法比Q学习具有收敛速度快、存储空间小的优势;根据实验结果分析了在最优策略下业务拒绝率与业务特征的关系。 Optimal call admission control(CAC) based on time compensation is concerned in this paper. The continuous-time Markov decision processes(CTMDP) for the system is established, and a method of afterstate Q-value updating is introduced according to the characteristics of the system. Then an optimal algorithm of event driven Q-learning is proposed to solve the call admission control problem. Finally, an example of numerical simulation is given. The simulation results show that the proposed al- gorithm needs less memory and has faster convergence than Q-learning. And on the basis of the experimental results, the relationship between the rejection rate of business and the characteristics of business is analyzed under optimal call admission policy.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2011年第1期76-79,共4页 Journal of Hefei University of Technology:Natural Science
基金 国家自然科学基金资助项目(60873003) 教育部回国人员科研启动基金资助项目(2009AKZR0279) 安徽省自然科学基金资助项目(090412046) 安徽省高校自然科学研究重点资助项目(KJ2008A058)
关键词 连续时间Markov决策过程 事件驱动Q学习 呼叫接入控制 continuous-time Markov decision processes(CTMDP) event driven Q-learning call admission control(CAC)
  • 相关文献

参考文献9

  • 1周亚平,奚宏生,殷保群,唐昊.连续时间Markov决策过程在呼叫接入控制中的应用[J].控制与决策,2001,16(B11):795-799. 被引量:3
  • 2Marbach P, Tsitsiklis J N. A neuro-dynamic programming approach to call admission control in integrated service networks: the single link case, Technical Report LIDS-P-2402 [R/OL]. Laboratory for Information and Decision Systems, 1997. [2008-09-06]. http://eprints. kfupm, edu. sa/ 73464/.
  • 3Choi J, Kwon T, Choi Y, et al. Call admission control for multimedia services in mobile cellular networks: a Markov decision approach[C]//IEEE International Symposium on Computer Communications, Antibes, 2000 : 594-599.
  • 4Senouci S M, Beylot A, Pujolle G. Call admission control in cellular networks: a reinforcement learning solution[J]. International Journal of Network Management, 2004,14 (2) : 89-103.
  • 5Yu Fei,Wong V W S,Leung V C M. A new QoS provisioning method for adaptive multimedia in wireless networks [J]. IEEE Transactions on Vehicular Technology, 2008,57 (3) : 1899-1909.
  • 6王利存,郑应平.一类事件驱动马氏决策过程的Q学习[J].系统工程与电子技术,2001,23(4):80-82. 被引量:2
  • 7Das T K, Gosavi A, Mahadevan S, et al. Solving semi-Markov decision problem using average reward reinforcement learning[J]. Management Science, 1999,45 (4) : 560-574.
  • 8唐昊,万海峰,韩江洪,周雷.基于多Agent强化学习的多站点CSPS系统的协作Look-ahead控制[J].自动化学报,2010,36(2):289-296. 被引量:8
  • 9岳峰.一阶非线性随机系统的学习优化控制[J].合肥工业大学学报(自然科学版),2010,33(5):679-682. 被引量:2

二级参考文献14

  • 1TANGHao YUANJi-Bin LUYang CHENGWen-Juan.Performance Potential-based Neuro-dynamic Programming for SMDPs[J].自动化学报,2005,31(4):642-645. 被引量:10
  • 2Puterman M L.Markov decision processes:discrete stochas-tic dynamic programming[M].New York:Wiley,1994:41-48.
  • 3Xu Y K,Cao X R.Time aggregation based optimal control and Lebesgue sampling[C] //The 46th IEEE Conference on Decision and Control.2007:12-14.
  • 4Sutton R S,Barto A G.Reinforcement learning:an intro-duction[M].Cambridge,MA:MIT Press,1960:1-20.
  • 5Cao X R,Ren Z Y.A time aggregation approach to Markov decision processes[J].Automatic,2002,38:929-943.
  • 6McCann R,Kumar-Gunda A,Damugada S R.Improved op-eration of networked control systems using Lebesgue sam-pling[C] //Industry Applications Conference,39th IAS An-nual Meeting,2004:1211-1216.
  • 7Cao X R.Basic ideas for event-based optimization of Markov systems[J].Discrete Event Dymmic Systems:Theory andApplications,2005,15:169-197.
  • 8Watkins C J C H.Learning from delayed rewards[D].Cambridge,England:University of Cambridge,1989.
  • 9Bradtke S J,Duff M O.Reinforcement learning methods for continuous-time Markov decision problema[C] //Ad-vances in Neural Information Processing Systems,Cam-bridge,MA:MIT Press,1995:393-400.
  • 10Tang Hao,Tamio A.Look-ahead control of conveyor-serv-iced production station by using potential-based online pol-icy iteration[J].International Journal of Control,2009,82(10):1917-1928.

共引文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部