摘要
强化学习能有效的实现随机动态交通环境下的自适应交通信号控制.为了适应交叉口交通需求水平的变化,在强化学习回报值定义中引入多重动态回报值结构.以孤立多相位信号控制交叉口为例,设计交叉口不同交通需求水平和交通需求变化情景,将算法与定时信号控制、Q-学习、SARSA进行性能比较,并分析不同回报定义和动作选择策略对算法性能的影响.结论表明,在所有交通情景下基于强化学习的自适应交通信号控制的鲁棒性比固定配时控制更强.在高交通需求情景下,算法均优于具有其他各种类型的回报定义的强化学习.最合适的回报定义是累计车辆延误的减少.在动作选择策略方面,协同ε-greedy和softmax方法可以获得更好的收敛性能.
Reinforcement learning(RL) has shown potential for realizing effective adaptive traffic signal control to reduce traffic congestion.A multiple dynamic reward structure was introduced into the reward definition in RL which can change the reward function type based on the varying traffic demand level at intersections in real time.The algorithm was implemented in an isolated multi-phase signalized intersection and compare the performance of the proposed algorithm with fixed control,and variants of the RL schemes.Results show the proposed algorithm performs more robustly than the fixed control regardless of the arrival profiles.The comparison with RL algorithms including Q-learning and SARSA indicate that the proposed algorithm performs better at higher traffic demand levels regardless of reward definition.The best reward function is the reduction in the cumulative delay that resulted in the minimum average delay.In terms of action selection method,synergizing ε-greedy and softmax methods allows for faster convergence and better online performance.
作者
夏新海
XIA Xin-hai(Department of Port and Shipping Management,Guangzhou Maritime University,Guangzhou 510725,China)
出处
《数学的实践与认识》
北大核心
2020年第22期153-166,共14页
Mathematics in Practice and Theory
基金
广东省自然基金项目(2016A030310104)
广州市哲学社会科学发展“十三五”规划2020年度课题(2020GZGJ299)。
关键词
交通工程
强化学习
交通信号控制
交叉口
traffic engineering
reinforcement learning
traffic signal control
intersection