摘要
首先对传统的绿灯时间等饱和度概念进行了扩展,提出了分级绿灯时间等饱和度。在此基础上,针对分级绿灯时间等饱和度目标,构造了奖赏函数,建立了定周期和变周期两种模式下的四种离线Q学习配时优化模型。相对于在线Q学习模型,离线Q学习模型更适合交叉口信号配时优化,变周期模式的离线Q学习模型可以获得解的结构、最优解的分布,这是传统配时理论不具备的。算例结果表明,定周期模式下最优解是唯一的。变周期模式下最优解是不唯一的,呈带状,奖赏分级模型比奖赏不分级的最优解更加集中。
In this paper, the traditional concept of saturation of the green time is extended. We proposed a multi-level green time saturation. On this basis, we constructed the reward function for multi-level green time equi-saturation and built up four off-line Q learning models for fixed and variable cycle patterns. The results show that compared to online Q-learning model, off-line Q-learning model is more suitable for traffic signal timing optimization. Q-learning model for variable-cycle mode can obtain the structure of the solution, and the distribution of the optimal solution, which is not available from the traditional timing theory. The numerical examples show that the optimal solution to fixed cycle pattern is unique, and the optimal solution for variable cycle pattern has a belt shape. The optimal solutions to multi-reward level are more concentrated than that to single-reward level.
出处
《系统工程》
CSSCI
CSCD
北大核心
2012年第7期117-122,共6页
Systems Engineering
基金
国家自然科学基金资助项目(71071024
70701006)
教育部科研重点项目(145)
湖南省教育厅科研项目(09A003
11C0038)
长沙市科技局重点项目(K1106004-11
K1001010-11)
道路结构与材料交通部重点实验室开放基金资助项目(kfj100206)
关键词
交通控制
配时优化
Q学习
离线
绿灯时间等饱和度
Traffic Control
Timing Optimization
Q Learning
Off-line
Green Time Equi-saturation