基于多智能体深度强化学习的高速公路可变限速协同控制方法

Coordinated Variable Speed Limit Control for Freeway Based on Multi-Agent Deep Reinforcement Learning

下载PDF

导出

摘要面向高速公路多路段可变限速协同控制需求,针对高维参数空间高效训练寻优难题,提出了应用多智能体深度确定性策略梯度(MADDPG)算法的高速公路可变限速协同控制方法。区别于既有研究的单个智能体深度确定性策略梯度(DDPG)算法,MADDPG将每个管控单元抽象为具备Actor-Critic强化学习架构的智能体,在算法训练过程中共享各智能体的状态、动作信息,使得各智能体具备推测其余智能体控制策略的能力,进而实现多路段协同控制。基于开源仿真软件SUMO,在高速公路典型拥堵场景对提出的控制方法开展管控效果验证。实验结果表明,提出的MADDPG算法降低了拥堵持续时间和路段运行速度标准差,分别减少69.23%、47.96%,可显著提高交通效率与安全。对比单智能体DDPG算法,MADDPG可节约50%的训练时间并提高7.44%的累计回报值,多智能体算法可提升协同控制策略的优化效率。进一步,为验证智能体间共享信息的必要性,将MADDPG与独立多智能体DDPG(IDDPG)算法进行对比:相较于IDDPG,MADDPG可使拥堵持续时间、速度标准差均值的改善提升11.65%、19.00%。 In order to meet the needs of coordinated variable speed limit(VSL)control of multi-segment on freeways,and to solve the problem of efficient training optimization in highdimensional parameter space,a multi-agent deep deterministic policy gradient(MADDPG)algorithm is proposed for freeway VSL control.Different from the existing research on the single agent Deep Deterministic Policy Gradient(DDPG)algorithm,MADDPG abstracts each control unit as an agent with Actor-Critic reinforcement learning architecture,and shares each agent in the algorithm training process.The state and action information of the agents enable each agent to have the ability to infer the control strategies of other agents,thereby realizing multisegment coordinated control.Based on the open source simulation software SUMO,the effect of the control method proposed is verified in a typical freeway traffic jam scenario.The experimental results show that the MADDPG algorithm proposed reduces the traffic jam duration and the speed standard deviation by 69.23%and 47.96%respectively,which can significantly improve the traffic efficiency and safety.Compared with the single-agent DDPG algorithm,MADDPG can save 50%of the training time and increase the cumulative return value by 7.44%.The multi-agent algorithm can improve the optimization efficiency of the collaborative control strategy.Further,in order to verify the necessity of sharing information among agents,MADDPG is compared with the independent DDPG(IDDPG)algorithm:It is shown that MADDPG can improve the traffic jam duration and speed standard deviation by 11.65%,19.00%respectively.

作者余荣杰徐灵章锐辞 YU Rongjie;XU Ling;ZHANG Ruici(Key Laboratory of Road and Traffic Engineering of the Ministry of Education,Tongji University,Shanghai 201804,China;Zhejiang Hangshaoyong Expressway Co.,Ltd.,Hangzhou 310000,China)

机构地区同济大学道路与交通工程教育部重点实验室浙江杭绍甬高速公路有限公司

出处《同济大学学报（自然科学版）》 EI CAS CSCD 北大核心 2024年第7期1089-1098,共10页 Journal of Tongji University:Natural Science

基金浙江省交通运输厅科技计划项目(2021047)。

关键词交通工程可变限速协同控制多智能体深度强化学习交通拥堵高速公路交通效率交通安全 traffic engineering coordinated variable speed limit control multi-agent deep reinforcement learning traffic jam freeway traffic efficiency traffic safety

分类号 U491.5 [交通运输工程—交通运输规划与管理]

引文网络
相关文献

1张聪颖,张宝玉.基于可变限速理论探讨城镇化地区公路限速[J].公路,2024,69(7):168-172.
2冯佳敏.数字化转型对就业的影响研究——来自国家级大数据综合试验区的准自然实验[J].技术与市场,2024,31(7):136-140.
3王之彰,郑怡涵.短跑项目的技术特点与高效训练方法探讨[J].田径,2024(7):27-28.
4陈沉,闫宇聪,王涛,刘喆,郑毅.企业金融化、生命周期与资本配置效率[J].财会通讯,2024(13):70-75.
5汤海浪.汽车主动转向-制动协同控制策略研究[J].汽车实用技术,2024,49(14):75-78.
6买丽克·伊明,吴芳.住院病案首页辅助分析模型的构建及效果验证[J].中国病案,2024,25(7):18-20.
7孙博文.数字功放电路设计中的效率优化策略分析[J].集成电路应用,2024,41(6):36-37.
8杨洋,王烨,康大勇,陈嘉玉,李姜,赵华栋.基于强化学习的多智能体协同电子对抗方法[J].兵器装备工程学报,2024,45(7):1-10.
9杨和林,郑梦婷,刘帅,肖亮,谢显中,熊泽辉.恶意干扰下的无人机辅助边缘计算加权能耗与时延智能优化[J].电子与信息学报,2024,46(7):2879-2887.
10李吉园,吴蕾.高校会计课程体系的优化策略[J].中文科技期刊数据库（引文版）教育科学,2024(8):0001-0004.

同济大学学报（自然科学版）

2024年第7期

浏览历史

内容加载中请稍等...

基于多智能体深度强化学习的高速公路可变限速协同控制方法

相关作者

相关机构

相关主题

浏览历史