摘要
具有自进化能力的自动驾驶换道策略有望在复杂开放的交通环境中提升性能,以应对更多的未知场景。时序差分学习模型预测控制(Temporal difference learning for model predictive control,TD-MPC)结合有模型和无模型强化学习方法的优势,具有学习效率高、性能优异的特点。基于此,为了提高自动驾驶换道策略的整体性能,提出基于TD-MPC的自动驾驶一体化换道策略。具体来说,针对自动换道问题,提出基于驾驶倾向网络的一体化自动驾驶换道策略架构,构建强化学习问题并设计完备的奖励函数,对决策规划优化问题进行统一求解。应用TD-MPC算法设计内部模型来预测未来状态和奖励,实现短时域内的局部轨迹优化,同时使用时序差分学习实现对长期汇报的估计,以得到驾驶倾向网络参数。所提出方法在高保真仿真环境中被验证,结果表明,所提出方法相比规则方案保证行驶效率,并且提高安全性和舒适性。同时与软演员-评论家算法(Soft actor critic,SAC)相比,实现了7~9倍的学习效率提升。
Autonomous vehicles are expected to achieve self-evolution in the real-world environment to gradually cover more complex and changing scenarios.Temporal difference learning for model predictive control(TD-MPC)combines the advantages of model-free and model-free reinforcement learning methods,and has the characteristics of high learning efficiency and excellent performance.Based on this,in order to improve the overall performance of the automated lane change policy,an integrated automated lane change method based on TD-MPC is proposed.Specifically,an integrated architecture based on driving propensity network is proposed.The reinforcement learning problem is constructed and a complete reward function is designed to solve the decision planning optimization problem in a unified way.The TD-MPC algorithm is used to design an internal model to predict the future state and reward,so as to realize the local trajectory optimization in the short time domain.At the same time,the temporal difference learning is used to estimate the long-term report to obtain the parameters of the driving tendency network.The proposed method is verified in a high-fidelity simulation environment.The results show that compared with the regular scheme,the proposed method ensures driving efficiency,and improves safety and comfort.At the same time,compared with the soft actor critic(SAC)algorithm,the learning efficiency is improved by 7 to 9 times.
作者
杨硕
李时珍
赵中原
黄小鹏
黄岩军
YANG Shuo;LI Shizhen;ZHAO Zhongyuan;HUANG Xiaopeng;HUANG Yanjun(School of Automotive Studies,Tongji University,Shanghai 201804;College of Automation,Nanjing University of Information Science and Technology,Nanjing 210044;China Electronics Technology Eastern Communication Group Co.,Ltd.,Guangdong 519060)
出处
《机械工程学报》
EI
CAS
CSCD
北大核心
2024年第10期329-338,共10页
Journal of Mechanical Engineering
基金
国家自然科学基金-企业创新发展联合基金资助项目(U23B2061)。
关键词
自动驾驶
强化学习
一体化决策规划
autonomous driving
reinforcement learning
integrated decision making and planning