摘要
传统酒店动态定价研究大多考虑改进需求预测方法或考虑需求环境已知,而现实生活中需求分布通常是未知的.本文考虑需求分布未知的情境,建立基于马尔可夫决策过程的酒店客房多周期动态定价模型,并利用强化学习方法,提出基于SARSA(λ)的改进算法对客房动态定价模型进行求解.为提升算法的求解能力和收敛速度,提出了基于改进ε-greedy策略的ε-SARSA(λ)算法和基于改进模拟退火策略的ISA-SARSA(λ)算法.通过数值实验对比SARSA(λ),ε-SARSA(λ),SA-SARSA(λ)和ISA-SARSA(λ)四种算法的收益优化结果,验证了改进算法的有效性,结果显示,ISA-SARSA(λ)算法求解性能最好.
Traditional hotel dynamic pricing research always considers improving demand forecasting methods or considers that the demand environment is known,while the demand distribution in real life is usually unknown.In this paper,we established a multi-period dynamic pricing model for hotel rooms based on Markov decision process with unknown demand distribution,and used the reinforcement learning method to propose improved algorithms based on SARSA(λ)to solve the dynamic pricing model of rooms.In order to improve the solving ability and convergence speed of the algorithm,we proposed theε-SARSA(λ)algorithm based on the improvedε-greedy strategy and the ISA-SARSA(λ)algorithm based on the improved simulated annealing strategy.Through numerical experiments,the revenue optimization results of the four algorithms,SARSA(λ),ε-SARSA(λ),SA-SARSA(λ)and ISA-SARSA(λ),were compared.The study results verify the effectiveness of improved algorithms and show that the ISA-SARSA(λ)algorithm has the best solution performance.
作者
朱晗
张敏
唐加福
ZHU Han;ZHANG Min;TANG Jiafu(School of Management Science and Engineering,Dongbei University of Finance and Economics,Dalian 116025,China)
出处
《系统工程理论与实践》
EI
CSCD
北大核心
2023年第2期509-523,共15页
Systems Engineering-Theory & Practice
基金
国家自然科学基金面上项目(72272027)
国家自然科学基金青年项目(71902018)
国家自然科学基金重点项目(71831003)
辽宁省自然科学基金(2022–KF–11–06)。