摘要
本文对[1,2]所考虑的无限阶段折扣费用部分可观察马尔可夫决策规划作了进一步的讨论,澄清了其中的一些模糊概念,补充或纠正了其中的疏漏和错误,特别地,在保持费用函数分片线性的原则下扩大了有限瞬时策略类,最后给出了几个新的结论,并对[1]中的策略迭代算法给出了修正及收敛估计。
In this paper, we have a further discussion on the infinite horizon partially observable Markov decision programming with discount costs, which has been considered by Sondik and Sawaki. We clarify some fuzzy concepts in [1,2] and correct some mistakes in [2]. Under the condition in which the cost function is piecewise linear, we extend the class of finitely transient policy. Finally, some new conclusions and an estimation of convergence on the policy iteration algorithm are given.
出处
《高校应用数学学报(A辑)》
CSCD
北大核心
1993年第2期210-221,共12页
Applied Mathematics A Journal of Chinese Universities(Ser.A)
基金
国家青年科学基金
关键词
瞬时策略
马氏决策规划
迭代法
Markov Decision Programming, Transient Policy, Piecewise-linear, Iteration Algorithm, Estimation of Convergence.