期刊文献+

基于竞争循环双Q网络的自适应交通信号控制 被引量:5

Adaptive Traffic Signal Control Based on Dueling Recurrent Double Q Network
原文传递
导出
摘要 为了更加有效且可靠地自适应协调交通流量,以减少车辆的停车等待时间为目标,提出了3DRQN(Dueling Double Deep Recurrent Q Network)算法对交通信号进行控制。算法基于深度Q网络,利用竞争架构、双Q网络和目标网络提高算法的学习性能;引入了LSTM网络编码历史状态信息,减少算法对当前时刻状态信息的依赖,使算法具有更强的鲁棒性。同时,针对实际应用中定位精度不高、车辆等待时间难以获取等问题,设计了低分辨率的状态空间和基于车流压力的奖励函数。基于SUMO建立交叉口的交通流模型,使用湖北省赤壁市交叉口收集的车流数据进行测试,并与韦伯斯特固定配时的策略、全感应式的信号控制策略和基于3DQN(Dueling Double Deep Q Network)的自适应控制策略进行比较。结果表明:所提出的3DRQN算法相较上述3种方法的车辆平均等待时间减少了25%以上。同时,在不同车流量及左转比例的场景中,随着左转比例和车流量的增大,3DRQN算法的车辆平均等待时间会有明显上升,但仍能保持较好效果,在车流量为1 800 pcu·h^(-1)、左转比例为50%的场景下,3DRQN算法的车辆平均等待时间相比3DQN算法减少约15%,相比感应式方法减少约24%,相比固定时长的方法减少约33%。在车流激增、道路通行受限、传感器失效等特殊场景下,该算法具有良好的适应性,即使在传感器50%失效的极端场景下,也优于固定时长的策略10%以上。表明3DRQN算法具有良好的控制效果,能有效减少车辆的停车等待时间,且具有较好的鲁棒性。 Most research on traffic signal control algorithms based on reinforcement learning has problems such as a lack of robustness verification in special scenarios,such as road traffic restrictions,and difficulties in obtaining information in real scenarios,such as high-precision positioning data.To adaptively coordinate traffic flow more effectively and reliably,the dueling double deep recurrent Q network(3 DRQN) algorithm is proposed to reduce the waiting time of vehicles.This algorithm is based on the deep Q network and uses a dueling architecture,double Q network,and target network to improve the learning performance.Moreover,it is combined with a long short-term memory network to encode historical state information to reduce dependence on the current state information and make the algorithm more robust.Furthermore,to solve the problem of low positioning accuracy and difficulty in obtaining the vehicle waiting time in practical applications,a low-resolution state space and a reward function based on traffic pressure were designed.The intersection model was established based on the SUMO simulation.The method was compared with the Webster’s method for fixed-time control,a full actuated signal control strategy,and the dueling double deep Q network(3 DQN) algorithm.The traffic data collected at an intersection in the city of Chibi,Hubei province,were used for testing.The average vehicle waiting time of the 3 DRQN method was reduced by more than 25%.As traffic densities and left-turn ratios increased,the average vehicle waiting times of the 3 DRQN algorithm increased significantly.However,the 3 DRQN algorithm could still maintain acceptable performance.In the scenario where the density was 1 800 pcu·h^(-1) and the left turn ratio was 50%,the average vehicle waiting time of 3 DRQN was reduced by 15% compared with the 3 DQN method,24% compared with the actuated signal control method,and 33% compared with the fixed-time method.The 3 DRQN method has good adaptability in special scenarios,such as traffic surges,limited road traffic,and sensor failure.Even in the worst-case scenario,in which 50% of the sensors failed,it performed better than the fixed-time strategy by more than 10%.The experimental results show that the 3 DRQN algorithm can effectively reduce the vehicle waiting time and can provide good control and robustness.
作者 陆丽萍 程垦 褚端峰 吴超仲 邱雨洁 LU Li-ping;CHENG Ken;CHU Duan-feng;WU Chao-zhong;QIU Yu-jie(School of Computer Science and Artificial Intelligence,Wuhan University of Technology,Wuhan 430070,Hubei,China;Intelligent Transportation Systems Research Center,Wuhan University of Technology,Wuhan 430063,Hubei,China)
出处 《中国公路学报》 EI CAS CSCD 北大核心 2022年第8期267-277,共11页 China Journal of Highway and Transport
基金 国家重点研发计划项目(2021YFB2501104)。
关键词 交通工程 交叉口信号控制 深度强化学习 深度Q网络 traffic engineering intersection signal control deep reinforcement learning deep Q network
  • 相关文献

参考文献4

二级参考文献11

共引文献84

同被引文献29

引证文献5

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部