期刊文献+

基于深度强化学习的机场出租车司机决策方法

Decision-making Method for Airport Taxi Drivers Based on Deep Reinforcement Learning
下载PDF
导出
摘要 针对以机场为代表的大型交通枢纽出租车调度困难的问题,从出租车司机利益的角度出发,提出一种基于改进深度强化学习的司机决策方法。该方法首先对机场环境和机场所在的城市环境进行模拟,定义了司机的状态、动作,与环境交互获得的奖励和状态转移。然后,以司机的状态参数作为DQN的输入,用DQN拟合状态-动作值函数(Q值函数)。最后,通过不断地让司机根据ε-贪心策略做出决策,并根据奖励函数达到更新DQN参数的目的。实验结果表明:在模拟的大、中、小型城市等环境下,司机都可以通过模型定量地得到当前各种决策动作的期望收益并作出合理的决策,从而自动地完成出租车调度的过程。 In order to deal with the difficulty of taxi dispatching in large transportation hub,especially in airport,from the view of the taxi driver’s profit,this paper proposes a decision-making method based on improved deep reinforcement learning.Firstly,the airport environment and the urban environment where the airport is located are simulated,and the driver’s states,actions,the rewards obtained from interaction with the environment and the state transitions are defined.Then,the states of the driver,as inputs,are fed into DQN to fit the values of Q-value function.Finally,through continuously simulating the drivers’decisions byε-greedy strategy and reward functions,this paper reaches the purpose of upgrading the parameters of DQN.The experiment results show that drivers can quantitatively get expected benefit for current decision actions and make proper decision through the model in simulated large,medium and small cities and other environments,so as to automatically complete the process of taxi dispatching.
作者 王鹏勇 陈龚涛 赵江烁 WANG Peng-yong;CHEN Gong-tao;ZHAO Jiang-shuo(School of Mathematics, China University of Mining and Technology, Xuzhou 221100, China)
出处 《计算机与现代化》 2020年第8期94-99,104,共7页 Computer and Modernization
基金 中国矿业大学大学生创新训练计划项目(20190510)。
关键词 出租车调度 深度强化学习 DQN Q值函数 taxi dispatching deep reinforcement learning DQN Q-value function
  • 相关文献

参考文献12

二级参考文献53

  • 1陈星,贾卓生.工业控制网络的信息安全威胁与脆弱性分析与研究[J].计算机科学,2012,39(S2):188-190. 被引量:29
  • 2刘洪丽,冯伯林.基于最优化思想的城市交通流分配[J].武汉理工大学学报(交通科学与工程版),2005,29(6):913-916. 被引量:4
  • 3覃运梅,石琴.出租车合乘模式的探讨[J].合肥工业大学学报(自然科学版),2006,29(1):77-79. 被引量:30
  • 4高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 5Hsieh S,Lin K H M.Building AGV traffic-control modelswith place-transition nets[J].The International Journalof Advanced Manufacturing Technology,1991,6(4):346-363.
  • 6Tzes A,Kim S,McShane W.Applications of Petri networksto transportation network modeling[J].IEEE Transactionson Vehicular Technology,1996,45(2):391-400.
  • 7Gallego J L,Farges J L,Henry J J.Design by Petrinets of an intersection signal controller[J].TransportationResearch Part C:Emerging Technologies,1996,4(4):231-248.
  • 8Di Febbraro A,Giglio D,Sacco N.On applying Petrinets to determine optimal offsets for coordinated trafficlight timings[C].The IEEE 5th International Conferenceon Intelligent Transportation Systems,2002:773-778.
  • 9Mun N K,Reaz M B I,Ali M A M.A review on theapplications of Petri nets in modeling,analysis,and controlof urban traffic[J].IEEE Transactions on IntelligentTransportation Systems,2013,14(2):858-870.
  • 10Huang Y S,Weng Y S,Zhou M C.Design of trafficsafety control systems for emergency vehicle preemptionusing timed Petri nets[J].IEEE Transactions on IntelligentTransportation Systems,2015.

共引文献168

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部