报酬无界的平均准则马氏决策过程(英文)

Average Optimality in Markov Decision Processes with Unbounded Rewards

下载PDF

导出

摘要本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在（ε）最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立. This paper studies average optimality in Markov decision processes with countable state space, nonempty action sets and unbounded reward function. New conditions are discussed under which there exists an (ε) optimal stationary policy, and that the average criterion optimality inequality holds when the summation in it is well defined.

作者胡奇英

机构地区西安电子科技大学经济管理学院

出处《运筹学学报》 CSCD 北大核心 2002年第1期1-8,共8页 Operations Research Transactions

基金 The project was supported by the National Natural Science Foundation of China.

关键词马氏决策过程平均准则最优不等式无界报酬非空决策集 Markov decision process, average criterion optimality inequality, un-bounded rewards, nonempty action sets.

分类号 O221.5 [理学—运筹学与控制论] O211.62 [理学—概率论与数理统计]

引文网络
相关文献

参考文献11

1A.Araposthasis, V.S. Borkar,E.Fernandez-Gaucherand, M.K.Ghosh, and S.I.Marcus, Discrete-time controlled Markovprocesses with average cost criterion: a survey, SIAM J.Control Optim.,31(1993),282-334.
2O.Hernandez-Lerma, and J.B.Lasserre, Weak conditions for average optimality inMarkov control processes, Sys. Contr. Lett.,22(1994),287-291.
3Q.Hu, Discounted and average Markov decision processes with unbounded rewards: newconditions, J.Math. Anal. Appl.,171(1992),111-124.
4Q.Hu and C.Xu, The Finiteness of the Reward Function and the Optimal Value Functionin Markov Decision Processes, Math. Methods in Oper. Res.,49(2)(1999),255-266.
5S.A. Lippman, Semi-Markov decision processes with unbounded rewards, Mgt. Sci.,19(1973),717-731.
6R.K. Ritt, and L.I. Sennott, Optimal stationary policies in general state spaceMarkov decision chains with finite action sets, Math. Oper. Res.,17(1992),901-909.
7M. Schal, Average optimality in dynamic programming with general state space, Math.Oper. Res., 18 (1993), 163-172.
8L.I. Sennott, Average cost optimal stationary policies in infinite state Markovdecision processes with unbounded costs., Oper. Res.,37(1989),626-633.
9L.I. Sennott, Average cost semi-Markov decision processes and the control ofqueueing systems., Prob. Eng. Inform. Sci.,3 (1989),247-272.
10L.I. Sennott, Another set of conditions for average optimality in Markov controlprocesses, Sys. Control Lett.,24(1995),147-151.

1胡奇英,刘建庸.马氏决策过程平均准则最优不等式综述[J].运筹学杂志,1996,15(2):1-9.
2胡奇英.非时齐无界报酬马氏决策规划[J].西安电子科技大学学报,1992,19(1):72-83.
3贾让成.向量值有限平均MDP[J].西北师范大学学报（自然科学版）,1994,30(3):16-19.
4胡奇英.无界报酬折扣马氏决策规划中的逐次逼近法[J].数理统计与应用概率,1995,10(2):31-37.
5袁琴,俞芳婷,王淼坤.第二类完全椭圆积分的平均值不等式[J].湖州师范学院学报,2017,39(2):12-16. 被引量：3
6伍从斌.无界报酬折扣半马氏决策模型矩最优策略的结构[J].云南大学学报（自然科学版）,1990,12(4):299-306. 被引量：1
7伍从斌.无界报酬折扣半马氏决策模型矩最优策略的存在性[J].云南大学学报（自然科学版）,1991,13(3):199-206.
8徐晨,甘小冰.半马氏环境连续时间马氏决策过程:平均准则[J].数学研究,1998,31(3):312-318.
9张升,张继红.无界报酬向量值折扣马氏决策规划[J].云南大学学报（自然科学版）,1993,15(3):200-207. 被引量：2
10胡奇英.状态部分可观察的无界报酬马氏决策规划[J].数理统计与应用概率,1998,13(3):79-86. 被引量：3

运筹学学报

2002年第1期

浏览历史

内容加载中请稍等...

报酬无界的平均准则马氏决策过程(英文)

参考文献11

相关作者

相关机构

相关主题

浏览历史