Ensuring the safety of pedestrians is essential and challenging when autonomous vehicles are involved.Classical pedestrian avoidance strategies cannot handle uncertainty,and learning-based methods lack performance gua...Ensuring the safety of pedestrians is essential and challenging when autonomous vehicles are involved.Classical pedestrian avoidance strategies cannot handle uncertainty,and learning-based methods lack performance guarantees.In this paper we propose a hybrid reinforcement learning(HRL)approach for autonomous vehicles to safely interact with pedestrians behaving uncertainly.The method integrates the rule-based strategy and reinforcement learning strategy.The confidence of both strategies is evaluated using the data recorded in the training process.Then we design an activation function to select the final policy with higher confidence.In this way,we can guarantee that the final policy performance is not worse than that of the rule-based policy.To demonstrate the effectiveness of the proposed method,we validate it in simulation using an accelerated testing technique to generate stochastic pedestrians.The results indicate that it increases the success rate for pedestrian avoidance to 98.8%,compared with 94.4%of the baseline method.展开更多
基金Project supported by the National Natural Science Foundation of China(Nos.61872217,U20A20285,52122217,and U1801263)the Key R&D Projects of the Ministry of Science and Technology of China(No.2020YFB1710901)。
文摘Ensuring the safety of pedestrians is essential and challenging when autonomous vehicles are involved.Classical pedestrian avoidance strategies cannot handle uncertainty,and learning-based methods lack performance guarantees.In this paper we propose a hybrid reinforcement learning(HRL)approach for autonomous vehicles to safely interact with pedestrians behaving uncertainly.The method integrates the rule-based strategy and reinforcement learning strategy.The confidence of both strategies is evaluated using the data recorded in the training process.Then we design an activation function to select the final policy with higher confidence.In this way,we can guarantee that the final policy performance is not worse than that of the rule-based policy.To demonstrate the effectiveness of the proposed method,we validate it in simulation using an accelerated testing technique to generate stochastic pedestrians.The results indicate that it increases the success rate for pedestrian avoidance to 98.8%,compared with 94.4%of the baseline method.