期刊文献+

基于风险避免强化学习的单交叉口配时优化

Signal timing optimization of a single intersection based on risk avoidance reinforcement learning
下载PDF
导出
摘要 现有的信号配时强化学习模型大多是风险中立的强化学习模型,其缺点是在线学习中稳定性和鲁棒性较差,需要的运行时间较长,且收敛效果不明显。为了解决存在的这些问题,建立了风险避免强化学习交通信号配时模型,用排队长度差作为模型的交通评价指标。在集成VISSIM-Excel VBAMatlab的仿真平台上进行了仿真实验,分析了风险程度系数对配时方案优劣程度、收敛性的影响;与风险中立的强化学习模型进行对比分析,得出了新模型,它在稳定性方面有较大的改进,收敛速度较快,在交通评价指标上运行效果好。针对交通信号配时优化这类问题,应采用增量风险避免强化学习方法,即风险程度系数应采用小步距递增的方式。 Most of the existing signal timing models are applied as the risk-neutral reinforcement learning model. The disadvantages of these models are instability and low robustness. Computing period of these models is long.In order to solve these problems,an on-line risk avoidance reinforcement learning model is formulated. The queue length difference is the performance index. Through VISSIM-Excel VBA-Matlab simulation platform, the effects of risk avoidance parameter on signal timing and convergence are analyzed. The proposed model and risk-neural reinforcement learning model are compared. The results show that the proposed model has quick convergence,better stability and almost the same performance.The incremental risk avoidance reinforcement learning method is suitable for signal timing optimization.That is, risk avoidance parameters should be increased in a small step.
出处 《交通科学与工程》 2014年第1期80-85,共6页 Journal of Transport Science and Engineering
基金 国家自然科学基金项目(71071024) 湖南省自然科学基金项目(12JJ2025) 长沙市科技局重点项目(K1106004-11)
关键词 增量风险避免 强化学习 信号配时 仿真 incremental risk avoidance reinforcement learning signal timing simulation
  • 相关文献

参考文献11

  • 1Oliveira D,Bazzan A L C, Silva B C, et al.Reinforce- ment learning based control of traffic lights in non- stationary environments: A case study in a micro- scopic simulator[A].Proceedings of the 4th Europe- an Workshop on Multi-Agent Systems[C].Lishon, Portugal: [s.n.], 2006 : 31 - 42.
  • 2Ilva B C, Oliveira D, Bazzan A L C, et al. Adaptive traffic control with reinforcement learning[A].Pro- ceedings of the 4th Workshop on Agents in Traffic and Transportation [C]. Hakodate, Japan: [ s. n.], 2006:80-86.
  • 3黄艳国,唐军,许伦辉.基于Agent的城市道路交通信号控制方法[J].公路交通科技,2009,26(10):126-129. 被引量:4
  • 4Wiering M, Veenen J V, Vreeken J, et al. Intelligent traffic light control, institute of information and compu- ting sciences[R].Dutch : Utrecht University, 2004.
  • 5戴朝晖,吴敏.基于混合抽象机制的多智能体系统动态分层强化学习算法研究[D].长沙:中南大学,2011.
  • 6卢守峰,韦钦平,刘喜敏.单交叉口信号配时的离线Q学习模型研究[J].控制工程,2012,19(6):987-992. 被引量:5
  • 7Heger M. Consideration of risk and reinforcement learning[A].Machine earning:Proceedings of the E- leventh International Conference[C]. San Francisco : Morgan Kaufmann Publishers, 1994 : 105 - 111.
  • 8Howard R A, Matheson J E. Risk-sensitive markov decision processes [J]. Management Science, 1972, 18(7) :356-369.
  • 9Singh S. Risk-sensitive reinforcement learning[J]. Machine Learning, 2002,49 (2 - 3) : 267- 290.
  • 10Sutton R S,Barto A G.Reinforcement learning: An introduction[M].Cambridge, MA: MIT Press, 1998.

二级参考文献22

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部