期刊文献+

基于深度强化学习的车辆路径问题求解方法 被引量:3

Solving vehicle routing problem using deep reinforcement learning
下载PDF
导出
摘要 车辆路径问题作为交通运输与物流领域最为经典的组合运筹优化问题,历经几十年的研究和讨论经久不衰,智慧物流呈现出的数据规模大、不确定性强、时效性高等特点,给高效、智能地解决车辆路径问题提出了新的挑战,推动了利用人工智能方法解决车辆路径问题研究的发展。目前,有部分国内外学者对深度强化学习在车辆路径问题中的应用进行了研究,但所得结果尚有一定的优化空间。基于此,本文提出了一种基于上置信区间算法改进动作选择的深度Q网络方法。该深度强化学习方法通过定义智能体与环境交互过程,逐一选取节点构造解的方式“端到端”地解决车辆路径问题。首先,针对考虑车辆装载约束的车辆路径问题建立了深度强化学习框架,设计了该场景下的深度强化学习优化目标和马尔可夫决策过程,通过状态-动作空间、奖励函数等要素的设置完善了该过程;并基于Transformer框架的注意力机制、修正线性单元的神经元激活函数和自适应动量估计梯度下降算法的反向传播机制设计了一个状态-动作价值网络。其次,针对DQN方法的值函数过估计和探索局限问题,运用UCT算法改进了动作选择方式,以提高该方法的性能和收敛性。实验结果表明:改进后的DQN方法在实验中表现良好,所提方法应用在考虑装载能力约束的车辆路径问题中,相比传统DQN方法,在20、50、100的问题规模中实验结果分别提升了1.89%、1.10%和2.17%,证明该方法具有较好的性能和泛化能力。 As the most classic combinatorial optimization problem in transportation and logistics,the vehicle routing problem(VRP)remains to be solved after decades of research and discussion.However,intelligent logistics presents the characteristics of large data scale,significant uncertainty,and demanding timeliness,which pose challenges in solving the VRP efficiently and promotes research pertaining to the use of artificial intelligence to solve the VRP.Domestic and foreign scholars have investigated the application of deep reinforcement learning(DRL)for solving the VRP;however,the experimental results remain unsatisfactory.Hence,a deep Q-learning network(DQN)method based on the upper confidence bound apply to tree(UCT)is proposed herein to improve policy decisionmaking.This DRL method solves the VRP end-to-end by defining the interaction between the agent and environment and selecting nodes to construct solutions individually.First,a DQN framework is established to solve the capacitated vehicle routing problem(CVRP).The optimization objective of DRL and the Markov decision process for the CVRP are designed,where the process is designed by the setting of state,action,reward,and other elements.A state-action value network is designed based on the attention mechanism of the Transformer architecture,neuron activation function of modified linear units,and backpropagation mechanism of the adaptive momentum estimation gradient descent algorithm.Second,owing to the overestimation of the value function and exploration limitations of the DQN method,the UCT algorithm is used to improve the tendency of exploration and utilization in policy decision-making to improve the performance and convergence of the DQN method.Experimental results show that the improved DQN performs well,and that compared with the conventional DQN,our method achieves 1.89%,1.10%,and 2.17%improvements in terms of CVRP-20,-50,and-100,thereby proving the favorable performance and generalization ability of the improved method.
作者 黄琰 张锦 HUANG Yan;ZHANG Jin(School of Transportation and Logistics,Southwest Jiaotong University,Chengdu 6117561,China;National United Engineering Laboratory of Integrated and Intelligent Transportation,Chengdu 611756,China;National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Chengdu 611756,China)
出处 《交通运输工程与信息学报》 2022年第3期114-127,共14页 Journal of Transportation Engineering and Information
基金 四川省科技厅重点研发项目(2019YFG0001)。
关键词 信息技术 车辆路径问题 深度强化学习 深度Q网络 Transformer框架 上置信区间算法 information technology vehicle routing problem deep reinforcement learning deep Q-learning networks transformer upper confidence bound apply to tree
  • 相关文献

参考文献11

二级参考文献101

共引文献512

同被引文献23

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部