摘要
现有的信号配时强化学习模型大多是风险中立的强化学习模型,其缺点是在线学习中稳定性和鲁棒性较差,需要的运行时间较长,且收敛效果不明显。为了解决存在的这些问题,建立了风险避免强化学习交通信号配时模型,用排队长度差作为模型的交通评价指标。在集成VISSIM-Excel VBAMatlab的仿真平台上进行了仿真实验,分析了风险程度系数对配时方案优劣程度、收敛性的影响;与风险中立的强化学习模型进行对比分析,得出了新模型,它在稳定性方面有较大的改进,收敛速度较快,在交通评价指标上运行效果好。针对交通信号配时优化这类问题,应采用增量风险避免强化学习方法,即风险程度系数应采用小步距递增的方式。
Most of the existing signal timing models are applied as the risk-neutral reinforcement learning model. The disadvantages of these models are instability and low robustness. Computing period of these models is long.In order to solve these problems,an on-line risk avoidance reinforcement learning model is formulated. The queue length difference is the performance index. Through VISSIM-Excel VBA-Matlab simulation platform, the effects of risk avoidance parameter on signal timing and convergence are analyzed. The proposed model and risk-neural reinforcement learning model are compared. The results show that the proposed model has quick convergence,better stability and almost the same performance.The incremental risk avoidance reinforcement learning method is suitable for signal timing optimization.That is, risk avoidance parameters should be increased in a small step.
出处
《交通科学与工程》
2014年第1期80-85,共6页
Journal of Transport Science and Engineering
基金
国家自然科学基金项目(71071024)
湖南省自然科学基金项目(12JJ2025)
长沙市科技局重点项目(K1106004-11)
关键词
增量风险避免
强化学习
信号配时
仿真
incremental risk avoidance
reinforcement learning
signal timing
simulation