The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric(unbalanced)demand is considered here.We pose the rebalancing problem within a semi Markov decision problem(SMDP)fra...The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric(unbalanced)demand is considered here.We pose the rebalancing problem within a semi Markov decision problem(SMDP)framework with closed queues of vehicles serving stationary,but asymmetric demand,over a large city with multiple stations(representing neighborhoods).We assume that the passengers queue up at every station until they are matched with a vehicle.The goal of the SMDP is to minimize a convex combination of the waiting time of the passengers and the total empty vehicle miles traveled.The resulting SMDP appears to be difficult to solve yielding closed-form expression for the optimal rebalancing strategy.Consequently,we use a deep reinforcement learning algorithm to determine the approximately optimal solution to the SMDP.We show through extensive Monte Carlo simulations that the trained policy outperforms other well-known state-dependent rebalancing strategies.展开更多
文摘The problem of designing a rebalancing algorithm for a large-scale ridehailing system with asymmetric(unbalanced)demand is considered here.We pose the rebalancing problem within a semi Markov decision problem(SMDP)framework with closed queues of vehicles serving stationary,but asymmetric demand,over a large city with multiple stations(representing neighborhoods).We assume that the passengers queue up at every station until they are matched with a vehicle.The goal of the SMDP is to minimize a convex combination of the waiting time of the passengers and the total empty vehicle miles traveled.The resulting SMDP appears to be difficult to solve yielding closed-form expression for the optimal rebalancing strategy.Consequently,we use a deep reinforcement learning algorithm to determine the approximately optimal solution to the SMDP.We show through extensive Monte Carlo simulations that the trained policy outperforms other well-known state-dependent rebalancing strategies.