摘要
有容量车辆路径问题是组合优化问题中比较热门的问题,它属于经典的NP-hard问题并且时间复杂度高.本文提出了一种基于策略梯度的超启发算法,将强化学习中的确定性策略梯度算法引入到超启发算法的高层策略中的底层算法选择策略,确定性策略梯度算法采用Actor-Critic框架,另外为了能够在后续计算和神经网络参数更新中引用历史经验数据,在确定性策略梯度算法中设计了经验池用于存储状态转移数据.在超启发算法解的接受准则方面,文中通过实验对比了3种接受准则的效果,最终选择了自适应接受准则作为高层策略中解的接受准则.通过对有容量车辆路径问题标准算例的计算,并将求解结果与其他算法对比,验证了所提算法在该问题求解上的有效性和稳定性.
The capacitated vehicle routing problem is popular in combinatorial optimization.It is a classic NP-hard problem with high time complexity.This paper proposed a hyper-heuristic algorithm based on policy gradient.The deterministic policy gradient algorithm in reinforcement learning is introduced into the low-level algorithm selection strategy in the high-level heuristic strategy of the hyper-heuristic algorithm.The deterministic policy gradient algorithm adopts the Actor-Critic framework.In addition,to reference historical experience data in subsequent calculations and parameter updates of neural networks,the experience pool is designed to store state transition data in a deterministic policy gradient algorithm.In terms of the acceptance criteria of the hyper-heuristic algorithm,the paper compared the effects of the three acceptance criteria through experiments,and finally,the adaptive acceptance criterion is chosen as the acceptance criterion in the high-level heuristic strategy.The effectiveness and stability of the proposed algorithm in solving the capacitated vehicle routing problem are verified by calculating the standard example and comparing with the results of other algorithms.
作者
张景玲
孙钰粟
赵燕伟
余孟凡
蒋玉勇
ZHANG Jing-ling;SUN Yu-su;ZHAO Yan-wei;YU Meng-fan;JIANG Yu-yong(Key Laboratory of Special Equipment Manufacturing and Advanced Processing Technology,Zhejiang University of Technology,Hangzhou Zhejiang 310014,China)
出处
《控制理论与应用》
EI
CAS
CSCD
北大核心
2024年第6期1111-1122,共12页
Control Theory & Applications
基金
国家自然科学基金项目(61402409)
浙江省自然科学基金项目(LY19F030017)资助。