On-demand food delivery(OFD)is gaining more and more popularity in modern society.As a kernel order assignment manner in OFD scenario,order recommendation directly influences the delivery efficiency of the platform an...On-demand food delivery(OFD)is gaining more and more popularity in modern society.As a kernel order assignment manner in OFD scenario,order recommendation directly influences the delivery efficiency of the platform and the delivery experience of riders.This paper addresses the dynamism of the order recommendation problem and proposes a reinforcement learning solution method.An actor-critic network based on long short term memory(LSTM)unit is designed to deal with the order-grabbing conflict between different riders.Besides,three rider sequencing rules are accordingly proposed to match different time steps of the LSTM unit with different riders.To test the performance of the proposed method,extensive experiments are conducted based on real data from Meituan delivery platform.The results demonstrate that the proposed reinforcement learning based order recommendation method can significantly increase the number of grabbed orders and reduce the number of order-grabbing conflicts,resulting in better delivery efficiency and experience for the platform and riders.展开更多
基金supported in part by the National Natural Science Foundation of China(No.62273193)Tsinghua University-Meituan Joint Institute for Digital Life,and the Research and Development Project of CRSC Research&Design Institute Group Co.,Ltd.
文摘On-demand food delivery(OFD)is gaining more and more popularity in modern society.As a kernel order assignment manner in OFD scenario,order recommendation directly influences the delivery efficiency of the platform and the delivery experience of riders.This paper addresses the dynamism of the order recommendation problem and proposes a reinforcement learning solution method.An actor-critic network based on long short term memory(LSTM)unit is designed to deal with the order-grabbing conflict between different riders.Besides,three rider sequencing rules are accordingly proposed to match different time steps of the LSTM unit with different riders.To test the performance of the proposed method,extensive experiments are conducted based on real data from Meituan delivery platform.The results demonstrate that the proposed reinforcement learning based order recommendation method can significantly increase the number of grabbed orders and reduce the number of order-grabbing conflicts,resulting in better delivery efficiency and experience for the platform and riders.