自记忆的深度强化学习模型求解多维背包问题

Based Self Memorized Deep Reinforcement Learning Model for Solving Multidimensional Knapsack Problem

下载PDF

导出

摘要本文针对多维背包问题维度高,约束强的特点提出了自记忆的学习优化模型(self memorized learn to improve,SML2I),通过深度强化学习的学习机制选择迭代搜索过程中的算子即模型学习当前的解以及历史搜索过程中的解,判断对当前解采用提升策略或者是扰动策略,在此基础上,进一步提出了哈希表与设计了2种有效的基于价值密度的扰动算子.使用哈希表记录历史搜索过程中的解,防止模型重复探索相同的解,基于价值密度的扰动策略生成的新解与之前的解决方案完全不同,因此针对扰动后的解再次采用提升策略同样有效,通过测试89个MKP数据集并与其他文献中先进的求解方法进行对比,实验结果验证了SML2I模型求解MKP问题的可行性与有效性. This paper proposes a self memorized learning to improve(SML2I)model for multidimensional knapsack problem,which is characterized by high dimensions and strong constraints.Through the learning mechanism of deep reinforcement learning,the operator in the iterative search process is selected,that is,the model learns the current solution and the solution in the historical search process.On this basis,the current solution is judged to be improved or disturbed,Furthermore,a hash table is proposed and two effective perturbation operators based on value density are designed.Using a hash table to record the solutions during the historical search process prevents the model from repeatedly exploring the same solution.The new solution generated by the value density based perturbation strategy is completely different from the previous solution,so it is equally effective to use the lifting strategy again for the perturbed solution.By testing 89 MKP datasets and comparing them with advanced solution methods in other literature,the experimental results verify the feasibility and effectiveness of the SML2I model in solving the MKP problem.

作者盛佳浩马良刘勇 SHENG Jiahao;MA Liang;LIU Yong(Business School,University of Shanghai for Science and Technology,Shanghai 200093,China)

机构地区上海理工大学管理学院

出处《小型微型计算机系统》 CSCD 北大核心 2024年第9期2137-2148,共12页 Journal of Chinese Computer Systems

基金上海市哲学社会科学规划课题项目(2019BGL014)资助教育部人文社会科学研究青年基金项目(21YJC630087)资助.

关键词多维背包问题深度强化学习多哈希邻域算子策略梯度 multidimensional knapsack problem deep reinforcement learning multi hash neighborhood operator strategy gradient

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1巨涛,王志强,刘帅,火久元,李启南.D3DQN-CAA:一种基于DRL的自适应边缘计算任务调度方法[J].湖南大学学报（自然科学版）,2024,51(6):73-85.
2陈雪,胡蓉,王辉,李作成,钱斌,李熠胥.学习型蚁群算法求解一类复杂两级车辆路径问题[J].系统仿真学报,2023,35(11):2476-2495. 被引量：1
3韩炎哲,孙福芹.基于一类Bazykin型功能反应扩散捕食模型解的性质分析[J].理论数学,2023,13(6):1630-1642.
4王健.毕节市委老干部局举办“弘扬传统文化·桑榆‘粽情’共享·我们的节日·端午”主题活动[J].晚晴,2024(6):104-104.
5罗继远,瑛子,王锋旗.生活即课程——吉安市保育院开发活力四射的园本课程侧记[J].江西教育,2024(17):19-22.
6张晓蕾,张军,刘海军,刘菊红.灰色GM(1,N)自记忆耦合模型在草原植被覆盖度中的应用[J].内蒙古农业大学学报（自然科学版）,2023,44(6):68-78.
7方苏,周平,俞玲,李启本,沈良,徐梓源,闫童.基于多邻域算子组合策略和模拟退火算法的线束截面布局优化设计及仿真验证方法[J].电力与能源,2024,45(3):287-291.
8苏志远,谭树勇,李金澎,程彬,韦凌云.一种新型轻小包裹高速分拣系统调度算法[J].系统工程,2024,42(4):123-138.
9洪涛,李梦迪,王翠,黄炎光.基于改进粒子群算法的标签天线结构参数多目标优化设计[J].微波学报,2024,40(4):57-62.
10匡增彧,谢晖,任莹晖,卓晓军,刘洋,易峦.退役产品性能损耗分级评价的拆解车间调度节能优化研究[J].机械工程学报,2024,60(14):396-408.

小型微型计算机系统

2024年第9期

浏览历史

内容加载中请稍等...

自记忆的深度强化学习模型求解多维背包问题

相关作者

相关机构

相关主题

浏览历史