期刊文献+

考虑框架协议的动态报童模型强化学习建模研究

Reinforcement Learning Model of Dynamic Newsboy Problem with Framework Protocol
下载PDF
导出
摘要 企业为了稳定货源和供货关系,常与供应商签订一定时期的框架性协议。为了解决零售商在框架协议下采购报童产品的问题,本文运用强化学习建立库存决策模型并使用Q学习算法求取较优订货策略。通过生成样本随机数来模拟需求量,对比研究Q学习算法订货和传统方法订货的差别。通过多次数值实验,发现使用强化学习方法订货相比于传统订货方法(定量订货法、移动平均预测、指数平滑法)平均利润提高约7%~22%,且多次实验下强化学习方法订货相比于理想状态的平均利润相差约8%。这些发现验证了强化学习解决库存问题的有效性和可行性。本文还研究了相关参数变化对总利润的影响,发现利润随着贪婪率(ε)增加而降低、随着学习率(α)的增加而增加。该结论能够为解决相关库存问题提供新的思路。 In order to stabilize the supply of goods and supply relations,enterprises often sign framework agreements with suppliers for a certain period of time.In order to solve the problem that retailers purchase newsboy products under the framework protocol,an inventory decision model is established by using reinforcement learning,and the optimal ordering strategy is obtained by using Q-learning algorithm.By generating random number of samples to simulate the demand,the difference between Q-learning algorithm and traditional ordering method is compared.Through a number of numerical experiments,it is found that the average profit of orderingwith reinforcement learning method is about 7%~22% higher thanof traditional ordering methods(quantitative ordering method,moving average forecasting and exponential smoothing),and the average profit difference of ordering with reinforcement learning method is about 8% compared with the ideal state.These findings verify the effectiveness and feasibility of reinforcement learning to solve inventory problems.This paper also studies the influence of several parameter changes on the total profit,and finds that the profit decreases with the increase of ε,while the profit increases with the increase of α.This conclusion can provide a new way of thinking for solving relevant inventory problems.
作者 祁玉青 赵兴雷 赵田东杰 QI Yu-qing;ZHAO Xing-lei;ZHAO Tian-dong-jie(School of Economics and Management,Nanjing Tech University,Nanjing 211816,China)
出处 《运筹与管理》 CSSCI CSCD 北大核心 2022年第10期105-112,共8页 Operations Research and Management Science
基金 国家自然科学青年基金项目(71701092) 国家社会科学基金项目(20BGL025)。
关键词 库存模型 框架协议 Q学习算法 inventory model framework agreement Q-learning algorithm
  • 相关文献

参考文献7

二级参考文献49

  • 1蔡兰,郭顺生.智能调度问题的综述和方法研究[J].科技进步与对策,2004,21(10):170-171. 被引量:6
  • 2魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量:19
  • 3王明春,高成修,曾永廷.VRPTW的扰动恢复及其TABUSEARCH算法[J].数学杂志,2006,26(2):231-236. 被引量:24
  • 4潘燕春,冯允成,周泓,魏佳呈.强化学习和仿真相结合的车间作业排序系统[J].控制与决策,2007,22(6):675-679. 被引量:3
  • 5Hua Z S, Li S J, Liang L. Impact of demand uncertainty on supply chain cooperation of single-period products [ J]. International Journal of Production Economics, 2006, I00 (2) : 268-284.
  • 6Janssen F, Heuts R, Kok T D. On the(R,s, Q)inventory model when demand is modeled as a compound Bernoulli process [ J]. European Journal of Operational Research, 1998, 104 : 423-436.
  • 7Matheus P , Gelders L. The (-R,Q)inventory policy subject to a compound Poisson demand pattern[ J]. International Journal of Production Economics, 2000, 68 : 307-317.
  • 8Scarf H. The optimality of(S,s)policies in dynamic inventory problem[ M ]. Mathematical Methods in Social Sciences. Stanford University Press, Stanford, CA, 1960.
  • 9Reynolds J I, Buffa F P. Customer service and safety stock: a clarification[ J]. Transportation Journal, 1980, 19: 82-88.
  • 10Langley Jr, John C. Determination of the economic order quantity under the condition of uncertainly[ J]. Transportation Journal, 1976, 16: 85-92.

共引文献60

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部