摘要
为了有效解决零售商在销售易逝品时的订货、旧产品处理及定价的联合决策问题,提出运用马氏决策过程建立模型及使用Q学习算法求得最优策略。最优策略包括各个状态下选择的决策动作,它能使从现在起及后续无限期的贴现总值为最大。算法中的迭代公式通过不断与环境进行互动并得到反馈,时刻更新最优策略。基于有限的状态集和动作集,在状态转移概率及当期期望收益未知的情况下,算法经过长时间学习后能够得到稳定的最优策略。研究发现,各参数(变化)对联合策略中各策略的特征有不同的影响,该结论为启发式策略的相关研究提供了一定的理论支持和解决思路。
To solve the jointed decisions problem of ordering,pricing and old products disposing faced for selling perishable products with a multi-period shelf life over an infinite horizon effectively,a model with Markov decision theory was established and the optimal policy was computed by using Q-learning algorithm.The optimal policy indicated the action of all states which could maximize the long-run discounted expected profit from current period.Through interacting with the environment and obtaining the feedback continuously,the iterate formula of algorithm renewed the optimal policy constantly.The stationary optimal policy would be computed after sufficient learning under situation of state and action space were finite and discrete,while the state transition probability and expected profit were not necessarily be known.The research showed that the different parameters had different and significant impact on the characteristic of each decision,and the conclusion provided some support and thought for researches of heuristic strategy.
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2017年第1期144-153,共10页
Computer Integrated Manufacturing Systems
基金
广东省自然科学基金资助项目(2016Z00052)~~
关键词
易逝品
马氏决策过程
Q学习算法
订货策略
定价策略
perishable product
Markov decision process
Q-learning algorithm
ordering decisions
pricing decisions