摘要
研究一类具有无界递归向量值报酬函数的非时齐马尔科夫决策规划,给出了模型的向量最优方程,首次给出了模型的向量ε─最优策略的定义及其存在的充分条件,讨论了该模型的有效策略和最优策略。
The time non-homogeneous Markovian decision rnodel with unbounded recursive vector·rewardis discussed. The vectoroptimality eguations for the model are established,The definition of vector ε-op-timal policy is given for the first time. The sufficient conditions for existence of vector ε-optirnal is given,Efficient Policies and optimal policies of this model are also discussed.
关键词
向量值报酬函数
递归报酬函数
马氏决策规划
Markovian Decision,recursive vector reward fanctions
vector ε-optinol police
efficient Policy:optimal policy