期刊文献+

基于end-to-end深度强化学习的多车场车辆路径优化 被引量:4

End-to-end deep reinforcement learning framework for multi-depot vehicle routing problem
下载PDF
导出
摘要 为提高多车场车辆路径问题(multi-depot vehicle routing problem,MDVRP)的求解效率,提出了端到端的深度强化学习框架。首先,将MDVRP建模为马尔可夫决策过程(Markov decision process,MDP),包括对其状态、动作、收益的定义;同时,提出了改进图注意力网络(graph attention network,GAT)作为编码器对MDVRP的图表示进行特征嵌入编码,设计了基于Transformer的解码器;采用改进REINFORCE算法来训练该模型,该模型不受图的大小约束,即其一旦完成训练,就可用于求解任意车场和客户数量的算例问题。最后,通过随机生成的算例和公开的标准算例验证了所提出框架的可行性和有效性,即使在求解客户节点数为100的MDVRP上,经训练的模型平均仅需2 ms即可得到与现有方法相比更具优势的解。 This paper proposed an end-to-end deep reinforcement learning framework to improve the efficiency of solving the multi-depot vehicle routing problem(MDVRP).This paper modeled a novel formulation of the Markov decision process(MDP)for the MDVRP,including the definitions of its state,action,and reward.Then,this paper exploited an improved graph attention network(GAT)as the encoder to perform feature embedding on the graph representation of MDVRP,and designed a Transformer-based decoder.Meanwhile,it used the improved REINFORCE algorithm to train the proposed encoder-decoder model.Furthermore,the designed encoder-decoder model wasn’t bounded by the size of the graph.That was,once the framework was trained,it could be used to solve MDVRP instances with different scales.Finally,the results on randomly generated and published standard instances verified the feasibility and effectiveness of the proposed framework.Significantly,even on solving MDVRP with 100 customer nodes,the trained model takes only two milliseconds on average to obtain a very competitive solution compared with existing methods.
作者 雷坤 郭鹏 王祺欣 赵文超 唐连生 Lei Kun;Guo Peng;Wang Qixin;Zhao Wenchao;Tang Liansheng(School of Mechanical Engineering,Southwest Jiaotong University,Chengdu 610031,China;Technology&Equipment of Rail Transit Operation&Maintenance Key Laboratory of Sichuan Province,Southwest Jiaotong University,Chengdu 610031,China;School of Economics&Management,Ningbo University of Technology,Ningbo Zhejiang 315211,China)
出处 《计算机应用研究》 CSCD 北大核心 2022年第10期3013-3019,共7页 Application Research of Computers
基金 浙江省高校重大人文社科攻关计划资助项目(2018QN060)。
关键词 多车场车辆路径问题 深度强化学习 图神经网络 REINFORCE算法 Transformer模型 multi-depot vehicle routing problem deep reinforcement learning graph neural network REINFORCE algorithm Transformer model
  • 相关文献

参考文献5

二级参考文献27

共引文献66

同被引文献89

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部