摘要
一、引言考虑一个部分可观察马尔可夫决策规划模型(简记为 POMDP),并引文[1]中所用的定义、记号(只将δ_N 换成∏,(?)换成 A)和有关结论.于是信息向量 π(t)与 π(t+1)
The paper discusses an optimal problem for partially observable Markov decision pro-gramming with stochastic variable discount factor over a fintie hirizon.It is shown that theoptimal return function is piecewise-linear and convex and,furthermore,is total-convex.Itis also shown that there exist optimal decision functions that are piecewise-constant.Basedon these results,the “one pass”[1] can be used that only need a few revision.The algorithmis simple and efficient.
出处
《系统科学与数学》
CSCD
北大核心
1993年第2期152-159,共8页
Journal of Systems Science and Mathematical Sciences
基金
国家自然科学基金资助课题