期刊文献+

交叉口交通信号动态回报值强化学习控制 被引量:1

Intersection Traffic Signal Control Based on Dynamic Reward Structure Reinforcement Learning
原文传递
导出
摘要 强化学习能有效的实现随机动态交通环境下的自适应交通信号控制.为了适应交叉口交通需求水平的变化,在强化学习回报值定义中引入多重动态回报值结构.以孤立多相位信号控制交叉口为例,设计交叉口不同交通需求水平和交通需求变化情景,将算法与定时信号控制、Q-学习、SARSA进行性能比较,并分析不同回报定义和动作选择策略对算法性能的影响.结论表明,在所有交通情景下基于强化学习的自适应交通信号控制的鲁棒性比固定配时控制更强.在高交通需求情景下,算法均优于具有其他各种类型的回报定义的强化学习.最合适的回报定义是累计车辆延误的减少.在动作选择策略方面,协同ε-greedy和softmax方法可以获得更好的收敛性能. Reinforcement learning(RL) has shown potential for realizing effective adaptive traffic signal control to reduce traffic congestion.A multiple dynamic reward structure was introduced into the reward definition in RL which can change the reward function type based on the varying traffic demand level at intersections in real time.The algorithm was implemented in an isolated multi-phase signalized intersection and compare the performance of the proposed algorithm with fixed control,and variants of the RL schemes.Results show the proposed algorithm performs more robustly than the fixed control regardless of the arrival profiles.The comparison with RL algorithms including Q-learning and SARSA indicate that the proposed algorithm performs better at higher traffic demand levels regardless of reward definition.The best reward function is the reduction in the cumulative delay that resulted in the minimum average delay.In terms of action selection method,synergizing ε-greedy and softmax methods allows for faster convergence and better online performance.
作者 夏新海 XIA Xin-hai(Department of Port and Shipping Management,Guangzhou Maritime University,Guangzhou 510725,China)
出处 《数学的实践与认识》 北大核心 2020年第22期153-166,共14页 Mathematics in Practice and Theory
基金 广东省自然基金项目(2016A030310104) 广州市哲学社会科学发展“十三五”规划2020年度课题(2020GZGJ299)。
关键词 交通工程 强化学习 交通信号控制 交叉口 traffic engineering reinforcement learning traffic signal control intersection
  • 相关文献

参考文献2

二级参考文献13

  • 1王飞跃.平行系统方法与复杂系统的管理和控制[J].控制与决策,2004,19(5):485-489. 被引量:322
  • 2Schmocker J D,Ahuja S,Bell M G H.Multi-objective signal control of urban junctions-Framework and a London case study[J].Transportation Research Part C:Emerging Technologies,2008,16 (4):454-470.
  • 3Hong L.Traffic adaptive control for oversaturated isolated intersections:Model development and simulation testing[J].Journal of Transportation Engineering,2004,130(5):594-601.
  • 4Michalewicz Z.A survey of constraint handling techniques in evolutionary computation methods[C].Proceeding of the 4th Annual Conference on Evolutionary Programming,1995:135-155.
  • 5Teklu F,Sumalee A,Watling D.A genetic algorithm approach for optimizing traffic control signals considering routing[J].Computer-Aided Civil and Infrastructure Engineering,2007,22 (1):31-43.
  • 6Ceylan H,Bell M G H.Traffic signal timing optimization based on genetic algorithm approach,including drivers' routing[J].Transportation Research Part B:Methodological,2004,38 (4):329-342.
  • 7Anderson J,Sayers T,Bell M.The objectives of traffic signal control[J].Traffic Engineering & Control,1998,39(3):167-170.
  • 8Sun Y X,Van B J,Wang Z H.A new golden ratio local search based particle swarm optimization[C].2012 International Conference on Systems and Informatics (ICSAI),2012:754-757.
  • 9Srinivas M,Patnaik L M.Adaptive probabilities of crossover and mutation in genetic algorithms[J].IEEE Transactions on System,Man and Cybernetics,1994,24(4):656-667.
  • 10马莹莹,杨晓光,曾滢.信号控制交叉口周期时长多目标优化模型及求解[J].同济大学学报(自然科学版),2009,37(6):761-765. 被引量:27

共引文献77

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部