摘要
提出了一种基于深度确定性策略梯度(DDPG,deep deterministic policy gradient)的行人安全智能交通信号控制算法;通过对交叉口数据的实时观测,综合考虑行人安全与车辆通行效率,智能地调控交通信号周期时长,相位顺序以及相位持续时间,实现交叉路口安全高效的智能控制;同时,采用优先经验回放提高采样效率,加速了算法收敛;由于行人安全与车辆通行效率存在相互矛盾,研究中通过精确地设计强化学习的奖励函数,折中考虑行人违规引起的与车辆的冲突量和车辆通行的速度,引导交通信号灯学习路口行人的行为,学习最佳的配时方案;仿真结果表明在动态环境下,该算法在行人与车辆冲突量,车辆的平均速度、等待时间和队列长度均优于现有的固定配时方案和其他的智能配时方案。
An intelligent traffic signal control algorithm based on Deep Deterministic Policy Gradient(DDPG)with Pedestrian Safeis proposed.Through the real-time observation of intersection data,the pedestrian safety and vehicle traffic efficiency are comprehensively considered,and the cycle duration,phase sequence and phase duration of traffic signals are intelligently controlled,safe and efficient intelligent control of intersections is realized.Meanwhile,priority empirical replay is adopted to improve sampling efficiency and accelerate algorithm convergence.Due to the contradiction between pedestrian safety and vehicle traffic efficiency,the reward function of reinforcement learning isaccurately designed,the pedestrian-vehicle conflicts caused by pedestrian violations and the speed of vehicles is considerd,traffic light isguided to learn pedestrian behaviors at intersections,and the best timing scheme is learned.The simulation results show that in the dynamic environment,the algorithm in terms of the number of collisions between pedestrians and vehicles,the average speed of vehicles,waiting time and queue length are better than the existing fixed timing schemes and other intelligent timing schemes.
作者
张乾隆
胡智群
肖海林
ZHANG Qianlong;HU Zhiqun;XIAO Hailin(School of Computer and Information Engineering,Hubei University,Wuhan 430062,China)
出处
《计算机测量与控制》
2022年第4期114-120,共7页
Computer Measurement &Control
基金
国家自然科学基金(61901163)。
关键词
交通信号灯
动态配时
强化学习
行人安全
车辆效率
优先经验回放
traffic signal light
dynamic timing
reinforcement learning
pedestrian safety
vehicle efficiency
prioritized experience replay