考虑框架协议的动态报童模型强化学习建模研究

Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol

下载PDF

导出

摘要企业为了稳定货源和供货关系,常与供应商签订一定时期的框架性协议。为了解决零售商在框架协议下采购报童产品的问题,本文运用强化学习建立库存决策模型并使用Q学习算法求取较优订货策略。通过生成样本随机数来模拟需求量,对比研究Q学习算法订货和传统方法订货的差别。通过多次数值实验,发现使用强化学习方法订货相比于传统订货方法(定量订货法、移动平均预测、指数平滑法)平均利润提高约7%~22%,且多次实验下强化学习方法订货相比于理想状态的平均利润相差约8%。这些发现验证了强化学习解决库存问题的有效性和可行性。本文还研究了相关参数变化对总利润的影响,发现利润随着贪婪率(ε)增加而降低、随着学习率(α)的增加而增加。该结论能够为解决相关库存问题提供新的思路。 In order to stabilize the supply of goods and supply relations,enterprises often sign framework agreements with suppliers for a certain period of time.In order to solve the problem that retailers purchase newsboy products under the framework protocol,an inventory decision model is established by using reinforcement learning,and the optimal ordering strategy is obtained by using Q-learning algorithm.By generating random number of samples to simulate the demand,the difference between Q-learning algorithm and traditional ordering method is compared.Through a number of numerical experiments,it is found that the average profit of orderingwith reinforcement learning method is about 7%~22% higher thanof traditional ordering methods(quantitative ordering method,moving average forecasting and exponential smoothing),and the average profit difference of ordering with reinforcement learning method is about 8% compared with the ideal state.These findings verify the effectiveness and feasibility of reinforcement learning to solve inventory problems.This paper also studies the influence of several parameter changes on the total profit,and finds that the profit decreases with the increase of ε,while the profit increases with the increase of α.This conclusion can provide a new way of thinking for solving relevant inventory problems.

作者祁玉青赵兴雷赵田东杰 QI Yu-qing;ZHAO Xing-lei;ZHAO Tian-dong-jie(School of Economics and Management,Nanjing Tech University,Nanjing 211816,China)

机构地区南京工业大学经济与管理学院

出处《运筹与管理》 CSSCI CSCD 北大核心 2022年第10期105-112,共8页 Operations Research and Management Science

基金国家自然科学青年基金项目(71701092) 国家社会科学基金项目(20BGL025)。

关键词库存模型框架协议 Q学习算法 inventory model framework agreement Q-learning algorithm

分类号 F224 [经济管理—国民经济]

引文网络
相关文献

参考文献7

1戴伟.一种改进企业在框架协议下库存管理的方法[J].运筹与管理,2011,20(4):182-186. 被引量：1
2毛克宁.报童问题及其商业拓展的两类利润期望模型[J].数学的实践与认识,2021,51(2):87-92. 被引量：4
3蒋国飞,吴沧浦.Q学习算法在库存控制中的应用[J].自动化学报,1999,25(2):236-241. 被引量：20
4郑江波,程福阳,杨柳.基于马氏决策过程的易逝品联合策略[J].计算机集成制造系统,2017,23(1):144-153. 被引量：3
5杨华龙,叶迪,张倩,曾庆成.时间窗变动的车辆调度干扰管理模型与算法[J].运筹与管理,2017,26(10):56-64. 被引量：15
6邰世文,商剑平.煤炭码头卸车调度问题多目标优化模型及算法[J].运筹与管理,2018,27(6):91-99. 被引量：13
7徐翔斌,李志鹏.强化学习在运筹学的应用:研究进展与展望[J].运筹与管理,2020,29(5):227-239. 被引量：12

二级参考文献49

1蔡兰,郭顺生.智能调度问题的综述和方法研究[J].科技进步与对策,2004,21(10):170-171. 被引量：6
2魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量：19
3王明春,高成修,曾永廷.VRPTW的扰动恢复及其TABUSEARCH算法[J].数学杂志,2006,26(2):231-236. 被引量：24
4潘燕春,冯允成,周泓,魏佳呈.强化学习和仿真相结合的车间作业排序系统[J].控制与决策,2007,22(6):675-679. 被引量：3
5Hua Z S, Li S J, Liang L. Impact of demand uncertainty on supply chain cooperation of single-period products [ J]. International Journal of Production Economics, 2006, I00 (2) : 268-284.
6Janssen F, Heuts R, Kok T D. On the(R,s, Q)inventory model when demand is modeled as a compound Bernoulli process [ J]. European Journal of Operational Research, 1998, 104 : 423-436.
7Matheus P , Gelders L. The (-R,Q)inventory policy subject to a compound Poisson demand pattern[ J]. International Journal of Production Economics, 2000, 68 : 307-317.
8Scarf H. The optimality of(S,s)policies in dynamic inventory problem[ M ]. Mathematical Methods in Social Sciences. Stanford University Press, Stanford, CA, 1960.
9Reynolds J I, Buffa F P. Customer service and safety stock: a clarification[ J]. Transportation Journal, 1980, 19: 82-88.
10Langley Jr, John C. Determination of the economic order quantity under the condition of uncertainly[ J]. Transportation Journal, 1976, 16: 85-92.

共引文献60

1胡晓华.基于体验式营销的易逝品定价和订货联合策略研究[J].投资与创业,2022,33(3):49-52.
2李文文.铁路智能卸车超限分级预警系统构建研究[J].铁道运输与经济,2019,0(S01):72-77. 被引量：4
3赵永叶,贲树军.学习马尔可夫模型的低秩谱估计算法[J].数学进展,2023,52(4):741-755.
4李随成,尹洪英.基于Q学习的供应链分销系统最优订货策略研究[J].控制与决策,2005,20(12):1404-1407. 被引量：2
5刘昌贵,但斌.基于蒙特卡罗仿真技术的随机型库存决策方法[J].重庆大学学报（自然科学版）,2006,29(2):140-143. 被引量：10
6党兴华,权小锋,尹洪英.强化学习算法在分阶段组合投资决策中的应用[J].科技管理研究,2006,26(3):241-243. 被引量：1
7刘虹.基于MDP自适应决策的库存控制[J].河北建筑科技学院学报,2006,23(3):109-112.
8权小锋,尹洪英.基于互惠合作的供应链合作关系稳定机制研究[J].物流技术,2007,26(8):158-163. 被引量：6
9权小锋,尹洪英.供应链分销系统奖金优化与仿真分析[J].物流技术,2007,26(9):86-89.
10程岩.电子商务中基于Q学习的动态交叉销售方法[J].管理科学学报,2008,11(3):106-113. 被引量：6

1何豪,何韵婷,翟晶,王筱金,王炳顺.基于移动平均预测限预判新型冠状病毒肺炎疫情趋势与适时风险分级[J].上海交通大学学报（医学版）,2020,40(4):422-429.
2李敏,肖扬,熊灿.基于改进WCMA的短时太阳能能量预测方法[J].仪器仪表学报,2020,41(10):92-99. 被引量：1
3包装资讯国际资讯[J].包装前沿,2021(6):85-85.
4千红,姚萍,周欣,周君.基于医院资源规划系统的库存控制研究[J].中国医学装备,2021,18(12):132-135. 被引量：3

运筹与管理

2022年第10期

浏览历史

内容加载中请稍等...

考虑框架协议的动态报童模型强化学习建模研究

参考文献7

二级参考文献49

共引文献60

相关作者

相关机构

相关主题

浏览历史