Unsignalized intersections pose a challenge for autonomous vehicles that must decide how to navigate them safely and efficiently.This paper proposes a reinforcement learning(RL)method for autonomous vehicles to naviga...Unsignalized intersections pose a challenge for autonomous vehicles that must decide how to navigate them safely and efficiently.This paper proposes a reinforcement learning(RL)method for autonomous vehicles to navigate unsignalized intersections safely and efficiently.The method uses a semantic scene representation to handle variable numbers of vehicles and a universal reward function to facilitate stable learning.A collision risk function is designed to penalize unsafe actions and guide the agent to avoid them.A scalable policy optimization algorithm is introduced to improve data efficiency and safety for vehicle learning at intersections.The algorithm employs experience replay to overcome the on-policy limitation of proximal policy optimization and incorporates the collision risk constraint into the policy optimization problem.The proposed safe RL algorithm can balance the trade-off between vehicle traffic safety and policy learning efficiency.Simulated intersection scenarios with different traffic situations are used to test the algorithm and demonstrate its high success rates and low collision rates under different traffic conditions.The algorithm shows the potential of RL for enhancing the safety and reliability of autonomous driving systems at unsignalized intersections.展开更多
基金supported by the National Natural Science Foundation of China (52102394,52172384)Hunan Provincial Natural Science Foundation of China (2023JJ10008)Young Elite Scientists Sponsorship Program by CAST (2022QNRC001)。
文摘Unsignalized intersections pose a challenge for autonomous vehicles that must decide how to navigate them safely and efficiently.This paper proposes a reinforcement learning(RL)method for autonomous vehicles to navigate unsignalized intersections safely and efficiently.The method uses a semantic scene representation to handle variable numbers of vehicles and a universal reward function to facilitate stable learning.A collision risk function is designed to penalize unsafe actions and guide the agent to avoid them.A scalable policy optimization algorithm is introduced to improve data efficiency and safety for vehicle learning at intersections.The algorithm employs experience replay to overcome the on-policy limitation of proximal policy optimization and incorporates the collision risk constraint into the policy optimization problem.The proposed safe RL algorithm can balance the trade-off between vehicle traffic safety and policy learning efficiency.Simulated intersection scenarios with different traffic situations are used to test the algorithm and demonstrate its high success rates and low collision rates under different traffic conditions.The algorithm shows the potential of RL for enhancing the safety and reliability of autonomous driving systems at unsignalized intersections.