摘要
针对自动驾驶的轨迹跟踪问题,为实现性能优异且具有实际应用价值的控制器,文章将双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)的深度强化学习算法应用于轨迹跟踪的横向控制。对车道线保持的应用场景进行控制器设计,首先基于TD3算法对神经网络结构及其参数进行设计,并依据人类驾驶员的行为方式定义状态空间和动作输出,使其具有较快的训练速度以及较好的控制执行效果;然后设计一种奖励函数,将跟踪精度和舒适度同时作为控制器性能的优化方向;最后,根据ISO 11270:2014(E)标准在Prescan中搭建多种使用场景进行仿真实验,验证所设计的控制器性能。通过与当前主流轨迹跟踪解决方案实验结果的对比,分别从跟踪精度和舒适度两方面证明了该控制器可以满足使用要求并且控制性能更加优异,具有的较高应用价值。
In order to explore a controller with better performance and practical application value for the trajectory tracking of autonomous driving,this paper applies the deep reinforcement learning algorithm of twin delayed deep deterministic policy gradient(TD3)to the lateral control of trajectory tracking.The controller design is based on the application scenario of lane line keeping.Firstly,the neural network structure and its parameters are designed based on the TD3 algorithm,and the state space and action output are defined according to the behavior of the human driver,so that it has higher training speed and better control effect.Then,a reward function is designed,which takes tracking accuracy and comfort as the optimization direction of controller performance at the same time.Finally,in order to verify the performance of the designed controller,a variety of simulation experiment scenarios were set up in Prescan to conduct simulation experiments according to the ISO 11270:2014(E)standard.In addition,the comparison with the experimental results of the current main trajectory tracking solutions proves that the controller can meet the application requirements and has better control performance in terms of tracking accuracy and comfort,and has high application value.
作者
张炳力
佘亚飞
ZHANG Bingli;SHE Yafei(School of Automobile and Traffic Engineering,Hefei University of Technology,Hefei 230009,China)
出处
《合肥工业大学学报(自然科学版)》
CAS
北大核心
2023年第7期865-872,共8页
Journal of Hefei University of Technology:Natural Science
基金
安徽省科技重大专项计划资助项目(JZ2022AKKZ0111)。
关键词
自动驾驶
轨迹跟踪
深度强化学习
双延迟深度确定性策略梯度(TD3)算法
奖励函数
autonomous driving
trajectory tracking
deep reinforcement learning
twin delayed deep deterministic policy gradient(TD3)algorithm
reward function