摘要
本文研究具有可数状态空间和任意行动空间的Lippman型无界报酬折扣半马氏决策模型(DSMDM)矩最优策略的结构.证明了:若策略π,σ是(K)矩最优的.则π~nσ及π的任一自组合策略也是(K)矩最优的,且存在与π等价的(K)矩最优策略π^(?),使~nπ^(*hn)为(K)矩最优的;存在(K)矩最优策略的充要条件是(K)矩最优行动集A_K(i)非空;策略π为(K)矩最优当且仅当π_n(A_K(i)|H_n,i)=1,α.e.P_(πn);π为(K)矩最优策略的又一充要条件是它可分解为若干个确定性(K)矩最优策略的一个凸组合.这样,该模型矩最优策略的结构就得到了较完满的解决.
The structure of a moment optimal policy is considered in discounted semi-Markov decision model with countable state space, arbitrary action space and unbounded rewards. If policies n, a are ( k) moment optimal, then also πσ and any self-combination policy of π are (k) moment optimal, and there exists a (k) moment optimal policy π* which is equivalent to π such that 'π*kn is a ( k) moment optimal policy. There exists a ( k) moment optimal policy if and only if (k) moment optimal action sets are nonempty for each state. Finally, a few necessary and sufficient conditions for a policy to be (k) moment optimal are obtained, and it is shown that the problem of constructing a (k) moment optimal policy has been solved completely.
出处
《云南大学学报(自然科学版)》
CAS
CSCD
1990年第4期299-306,共8页
Journal of Yunnan University(Natural Sciences Edition)
关键词
折扣模型
无界报酬
矩
最优策略
discounted model, unbounded rewards, moments, optimal policy