期刊文献+

Semi-Markov adaptive critic heuristics with application to airline revenue management 被引量:1

Semi-Markov adaptive critic heuristics with application to airline revenue management
原文传递
导出
摘要 The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) alike.It is one of the ?rst RL and ADP algorithms.RL and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and modeling.Many real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable.Unfortunately for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the SMDP.Examples of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue management.In this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion.We present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability 1.We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon.The problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic programming.Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry. The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) alike.It is one of the ?rst RL and ADP algorithms.RL and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and modeling.Many real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable.Unfortunately for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the SMDP.Examples of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue management.In this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion.We present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability 1.We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon.The problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic programming.Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.
出处 《控制理论与应用(英文版)》 EI 2011年第3期421-430,共10页
基金 supported by the National Science Foundation (No.ECS0841055)
关键词 Adaptive critics Actor critics Semi-Markov Approximate dynamic programming Reinforcement learning Adaptive critics Actor critics Semi-Markov Approximate dynamic programming Reinforcement learning
  • 相关文献

参考文献37

  • 1Abhijit Gosavi.A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis[J]. Machine Learning . 2004 (1)
  • 2Abhijit Gosavi,Naveen Bandla,Tapas K. Das.A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking[J]. IIE Transactions . 2002 (9)
  • 3R. Bellman.The theory of dynamic programming. Proceedings of the National Academy of Sciences of the United States of America . 1952
  • 4P. Abbeel,,A. Coates,M. Quigley,et al.An application of reinforcement learning to aerobatic helicopter fight. Advances in Neural Information Processing Systems 19 . 2006
  • 5A. Gosavi.Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement Learning. . 2003
  • 6A. Gosavi.Adaptive critics for airline revenue management. Proceedings of the Production and Operations Management Society . 2007
  • 7D. P. Bertsekas.Dynamic Programming and Optimal Control. . 2000
  • 8A. G. Barto,R. S. Sutton,C. W. Anderson.Neuronlike adaptive elements that can solve difficult learning control problems. Arti?cia Neural Networks . 1990
  • 9F. L. Lewis,D. Vrabie.Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine . 2009
  • 10S. N. Balakrishnan,J. Ding,F. L. Lewis.Issues on stability of adp feedback controllers for dynamical systems. IEEE Transactions on Systems Man and Cybernetics . 2008

同被引文献13

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部