期刊文献+

基于PPO算法的旅行商问题求解模型 被引量:1

A Traveling Salesman Problem Solver Based on Proximal Policy Optimization
下载PDF
导出
摘要 旅行商问题,即TSP(Traveling Salesman Problem)问题,是经典计算模型中的NP-hard问题。也因为其为NP-hard,所以从理论上来说目前并没有多项式时间的算法可以快速计算出给定图的实例所对应的TSP旅行路线,即tour。近些年来,对于小规模的图(顶点数不超过100,称为TSP100),人们提出了基于神经网络模型的方法去计算出tour。特别的,在[Kwon等人,NIPS 2020]中,Kwon等人提出了POMO(Policy Optimization with Multiple Optima)模型,对TSP100问题可以给出接近目前启发式策略所能获得的最短tour,且相应的计算时间相比较于启发式策略加快了近一个数量级。本文基于PPO(Proximal Policy Optimization)算法,对该模型进行了微调(fine-tune),将其在TSP100相关的测试集上的平均tour长度从7.80改进到7.791,而目前不基于学习的启发式算法所能找到最短的平均tour长度为7.76。本文中的结果更加接近于目前的最好结果,但相比启发式策略,得到结果的时间大大缩短。 The Traveling Salesman Problem(TSP) is an NP-hard problem in the classic computational model. As it is NP-hard, theoretically, there does not exist any polynomial algorithm for it. So for a given graph instance, we cannot get the TSP tour efficiently. In recent years, for graphs of moderate size(the number of vertices is no greater than 100, referred to as TSP 100), there are models proposed based on neural networks. Especially in [Kwon et al., NIPS 2020], Kwon et al. proposed POMO(Policy Optimization with Multiple Optima), which gives nearly identical shortest tours compared with those results achieved by heuristic approaches. Also, the inference time is sped up by more than an order of magnitude, compared with heuristic algorithms. Here in the paper, we proposed a training method based on PPO(Proximal Policy Optimization), which fine-tunes over the POMO model, improving the average tour length from 7.80 to 7.791. At the same time, the best result so far obtained by algorithms based on heuristic and not learning-based achieves the tour length of 7.76. The result in the paper is closer to this best result. Also, compared to heuristic approaches, the time to get the tour is much less.
作者 贝世之 严嘉钰 章乐 BEI Shizhi;YAN Jiayu;ZHANG Le(Beijing Electronic Science and Technology Institute,Beijing 100070,P.R.China)
出处 《北京电子科技学院学报》 2021年第4期88-95,共8页 Journal of Beijing Electronic Science And Technology Institute
基金 中央高校基本科研业务费项目328201904资助。
关键词 旅行商问题 强化学习 策略梯度算法 traveling salesman problem reinforcement learning policy optimization algorithms
  • 相关文献

参考文献1

二级参考文献1

共引文献3

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部