摘要
企业为了稳定货源和供货关系,常与供应商签订一定时期的框架性协议。为了解决零售商在框架协议下采购报童产品的问题,本文运用强化学习建立库存决策模型并使用Q学习算法求取较优订货策略。通过生成样本随机数来模拟需求量,对比研究Q学习算法订货和传统方法订货的差别。通过多次数值实验,发现使用强化学习方法订货相比于传统订货方法(定量订货法、移动平均预测、指数平滑法)平均利润提高约7%~22%,且多次实验下强化学习方法订货相比于理想状态的平均利润相差约8%。这些发现验证了强化学习解决库存问题的有效性和可行性。本文还研究了相关参数变化对总利润的影响,发现利润随着贪婪率(ε)增加而降低、随着学习率(α)的增加而增加。该结论能够为解决相关库存问题提供新的思路。
In order to stabilize the supply of goods and supply relations,enterprises often sign framework agreements with suppliers for a certain period of time.In order to solve the problem that retailers purchase newsboy products under the framework protocol,an inventory decision model is established by using reinforcement learning,and the optimal ordering strategy is obtained by using Q-learning algorithm.By generating random number of samples to simulate the demand,the difference between Q-learning algorithm and traditional ordering method is compared.Through a number of numerical experiments,it is found that the average profit of orderingwith reinforcement learning method is about 7%~22% higher thanof traditional ordering methods(quantitative ordering method,moving average forecasting and exponential smoothing),and the average profit difference of ordering with reinforcement learning method is about 8% compared with the ideal state.These findings verify the effectiveness and feasibility of reinforcement learning to solve inventory problems.This paper also studies the influence of several parameter changes on the total profit,and finds that the profit decreases with the increase of ε,while the profit increases with the increase of α.This conclusion can provide a new way of thinking for solving relevant inventory problems.
作者
祁玉青
赵兴雷
赵田东杰
QI Yu-qing;ZHAO Xing-lei;ZHAO Tian-dong-jie(School of Economics and Management,Nanjing Tech University,Nanjing 211816,China)
出处
《运筹与管理》
CSSCI
CSCD
北大核心
2022年第10期105-112,共8页
Operations Research and Management Science
基金
国家自然科学青年基金项目(71701092)
国家社会科学基金项目(20BGL025)。
关键词
库存模型
框架协议
Q学习算法
inventory model
framework agreement
Q-learning algorithm