摘要
讨论了无界报酬非时齐折扣马氏决策模型,且折扣因子βt依赖于前一阶段所处的状态和采取的行动,从而推广了常数折扣因子的马氏决策模型,在一定的假设下,得到了最优方程,证明了存在ε-最优马氏策略。
In this paper, a non-stationary discounted Markovian Decision model with unbounded rewards is investigated, in which the discount factor β_t is dependent of the state and the action taken before last step of the system, under some assumptions, the optimality equations are established, and the existence of an ε-optimal policy is proved.
出处
《衡阳师专学报》
1997年第6期16-22,共7页
Journal of Hengyang Normal University
关键词
非时齐折扣
马氏决策模型
无界报酬
最优方程
non-stationary Markovian decision model
unbounded reward
optimality equation
ε-optimal Markovian policy