摘要
针对以机场为代表的大型交通枢纽出租车调度困难的问题,从出租车司机利益的角度出发,提出一种基于改进深度强化学习的司机决策方法。该方法首先对机场环境和机场所在的城市环境进行模拟,定义了司机的状态、动作,与环境交互获得的奖励和状态转移。然后,以司机的状态参数作为DQN的输入,用DQN拟合状态-动作值函数(Q值函数)。最后,通过不断地让司机根据ε-贪心策略做出决策,并根据奖励函数达到更新DQN参数的目的。实验结果表明:在模拟的大、中、小型城市等环境下,司机都可以通过模型定量地得到当前各种决策动作的期望收益并作出合理的决策,从而自动地完成出租车调度的过程。
In order to deal with the difficulty of taxi dispatching in large transportation hub,especially in airport,from the view of the taxi driver’s profit,this paper proposes a decision-making method based on improved deep reinforcement learning.Firstly,the airport environment and the urban environment where the airport is located are simulated,and the driver’s states,actions,the rewards obtained from interaction with the environment and the state transitions are defined.Then,the states of the driver,as inputs,are fed into DQN to fit the values of Q-value function.Finally,through continuously simulating the drivers’decisions byε-greedy strategy and reward functions,this paper reaches the purpose of upgrading the parameters of DQN.The experiment results show that drivers can quantitatively get expected benefit for current decision actions and make proper decision through the model in simulated large,medium and small cities and other environments,so as to automatically complete the process of taxi dispatching.
作者
王鹏勇
陈龚涛
赵江烁
WANG Peng-yong;CHEN Gong-tao;ZHAO Jiang-shuo(School of Mathematics, China University of Mining and Technology, Xuzhou 221100, China)
出处
《计算机与现代化》
2020年第8期94-99,104,共7页
Computer and Modernization
基金
中国矿业大学大学生创新训练计划项目(20190510)。
关键词
出租车调度
深度强化学习
DQN
Q值函数
taxi dispatching
deep reinforcement learning
DQN
Q-value function