期刊文献+

基于改进深度强化学习方法的单交叉口信号控制 被引量:16

Signal Control of Single Intersection Based on Improved Deep Reinforcement Learning Method
下载PDF
导出
摘要 利用深度强化学习技术实现路口信号控制是智能交通领域的研究热点。现有研究大多利用强化学习来全面刻画交通状态以及设计有效强化学习算法以解决信号配时问题,但这些研究往往忽略了信号灯状态对动作选择的影响以及经验池中的数据采样效率,导致训练过程不稳定、迭代收敛较慢等问题。为此,文中在智能体模型设计方面,将信号灯状态纳入状态设计,并引入动作奖惩系数来调节智能体动作选择,以满足相位最小绿灯时间和最大绿灯时间的约束。同时,结合短期内交通流存在的时序相关性,文中采用优先级序列经验回放(Priority Sequence Experience Replay,PSER)的方式来更新经验池中序列样本的优先级,使得智能体获取与交通状况匹配度更高的前序相关样本,并通过双Q网络和竞争式Q网络来进一步提升DQN(Deep Q Network)算法的性能。最后,以杭州市萧山区市心中路和山阴路形成的单交叉口为例,在仿真平台SUMO(Simulation of Urban Mobility)上对算法进行验证,实验结果表明,提出的智能体模型优于无约束单一状态模型,在此基础上提出的算法能够有效缩短车辆平均等待时间和路口总排队长度,控制效果优于实际配时策略以及传统的DQN算法。 Using deep reinforcement learning technology to achieve signal control is a researches hot spot in the field of intelligent transportation.Existing researches mainly focus on the comprehensive description of traffic conditions based on reinforcement learning formulation and the design of effective reinforcement learning algorithms to solve the signal timing problem.However,the influence of signal state on action selection and the efficiency of data sampling in the experience pool are lack of considerations,which may result in unstable training process and slow convergence of the algorithm.This paper incorporates the signal state into the state design of the agent model,and introduces action reward and punishment coefficients to adjust the agent’s action selection in order to meet the constraints of the minimum and maximum green light time.Meanwhile,considering the temporal correlation of short-term traffic flow,the PSER(Priority Sequence Experience Replay)method is used to update the priorities of sequence samples in the experience pool.It facilitates the agent to obtain the preorder correlation samples with higher matching degree corresponding to traffic conditions.Then the double deep Q network and dueling deep Q network are used to improve the performance of DQN(Deep Q Network)algorithm.Finally,taking the single intersection of Shixinzhong Road and Shanyin Road,Xiaoshan District,Hangzhou,as an example,the algorithm is verified on the simulation platform SUMO(Simulation of Urban Mobility).Experimental results show that the proposed agent model outperforms the unconstrained single-state agent models for traffic signal control problems,and the algorithm proposed in the paper can effectively reduce the average waiting time of vehicles and total queue length at the intersection.The general control performance is better than the actual signal timing strategy and the traditional DQN algorithm.
作者 刘志 曹诗鹏 沈阳 杨曦 LIU Zhi;CAO Shi-peng;SHEN Yang;YANG Xi(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
出处 《计算机科学》 CSCD 北大核心 2020年第12期226-232,共7页 Computer Science
基金 浙江省公益技术研究计划项目(LGG20F030008) 浙江省自然科学基金项目(LY20F030018)。
关键词 信号控制 动作奖惩系数 多指标系数加权 优先级序列经验回放 深度Q网络 Signal control Action reward and punishment coefficient Weighted multi-index coefficient Priority sequence experience replay Deep Q Network
  • 相关文献

参考文献3

二级参考文献3

共引文献94

同被引文献91

引证文献16

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部