期刊文献+

基于深度确定性策略梯度的队列纵向协同控制策略 被引量:5

Deep deterministic policy gradient based cooperative platoon longitudinal control strategy
原文传递
导出
摘要 为了解决车辆队列控制中的车辆连续精确控制问题和行驶过程中车辆队列纵向稳定性问题,提出了一种在中等速度环境下基于深度强化学习(deep reinforcement learning,DRL)的车辆队列纵向控制策略。该策略充分考虑了影响队列安全的车辆距离、车辆速度和车辆加速度3个关键影响因素,并将车辆动力学和舒适性作为策略学习过程中的约束条件。首先,建立基于强化学习的车辆队列纵向控制模型。其次,提出一个深度强化学习过程来进行队列纵向控制策略的迭代,最终目标为获得车辆的最优控制策略;并且设计了一个多目标的奖励函数,该函数综合了距离误差、速度误差和加速度约束对应的奖励。最后,采用深度确定性策略梯度(deep deterministic policy gradient,DDPG)求解队列纵向控制问题,该算法将动作-评价(actor-critic,AC)网络的优点与深度Q网络(deep Q-network,DQN)的优点相结合,有效解决连续状态空间和连续动作空间上的车辆队列控制问题;并设计和训练了基于DDPG的队列控制模型用于队列纵向控制,验证该控制策略的有效性。结果表明:提出的基于强化学习的队列控制方法具有和分布式模型预测控制算法相当的控制精度,并能在"前车-领航车跟随"通信拓扑下实现队列的串稳定性。 To solve the problem of continuous and accurate platoon control and string stability during platoon traveling,a deep reinforcement learning(DRL)-based platoon longitudinal control strategy at moderate speed was proposed.Three key factors including spacing,vehicle speed and acceleration,were fully considered and satisfied by the proposed strategy,which considers vehicle dynamics and comfort in the learning process.First,the platoon control process was modeled and the algorithm of the reinforcement learning was illustrated.Second,a DRL-based method that determines the optimal strategy for platoon longitudinal control was proposed.Particularly,a multi-objective reward function was designed,which can integrate the rewards corresponding to the distance error,speed error,and acceleration constraints.Third,the deep deterministic policy gradient(DDPG)was adopted to solve the platoon longitudinal control problem.The algorithm combined actor-critic(AC)and deep Q-network(DQN)to effectively solve the problem of platoon control in continuous state space and continuous action space.The results show that the proposed platoon control method based on reinforcement learning has the same control accuracy as the distributed model predictive control algorithm,and can achieve the string stability of a platoon under the leader-follower communication topology.
作者 闵海根 杨一鸣 王武祺 方煜坤 宋晓鹏 MIN Hai-gen;YANG Yi-ming;WANG Wu-qi;FANG Yu-kun;SONG Xiao-peng(School of Information&Engineering,Chang'an University,Xi'an 710064,Shaanxi,China;Joint Laboratory for Internet of Vehicles,Ministry of EducationChina MobileCommunications Corporation,Chang'an University,Xi'an 710064,Shaanxi,China;Zhejiang Transportation Planning and Design Institute Co.,Ltd,Hangzhou 310017,Zhejiang,China)
出处 《长安大学学报(自然科学版)》 CAS CSCD 北大核心 2021年第4期90-100,共11页 Journal of Chang’an University(Natural Science Edition)
基金 国家自然科学基金项目(61903046) 陕西省重点研发计划项目(2021GY-290) 浙江省重点研发计划项目(2020C01057) “车联网”教育部-中国移动联合实验室基金项目(教技司(2016)477号)。
关键词 交通工程 深度强化学习 队列纵向控制 深度确定性策略梯度 队列稳定性 traffic engineering deep reinforcement learning platoon longitudinal control deep deterministic policy gradient platoon string stability
  • 相关文献

参考文献1

二级参考文献8

共引文献2

同被引文献69

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部