摘要
人类在处理问题中往往分为两个层次,首先在整体上把握问题,即提出大体方案,然后再具体实施。也就是说人类就是具有多分辨率智能系统的极好例子,他能够在多个层次上从底向上泛化(即看问题角度粒度变"粗",它类似于抽象),并且又能从顶向下进行实例化(即看问题角度变"细",它类似于具体化)。由此构造了由在双层(理想空间即泛化和实际空间即实例化)上各自运行的马尔可夫决策过程组成的半马尔可夫决策过程,称之为双马尔可夫决策过程联合模型。然后讨论该联合模型的最优策略算法,最后给出一个实例说明双马尔可夫决策联合模型能够经济地节约"思想",是运算有效性和可行性的一个很好的折中。
Human thought is often divided two levels while dealing with problems. First people always treat problems from a whole perspective, i. e. , they have a general plan, then they specifically deal with details. The human itself is a good example for having a multi-resolutional characteristic. It can not only generalize bottom-up among multi-levels (the granule of viewpoint about problem becomes "rough", analogous to abstract), but also instantiate top-down (the granule of viewpoint becomes "thin",analogous to specification). So we constructed a semi-Markov decision process consisting of two Markov decision processes running respectively on two levels--the ideal space (generalization) and the actual space (instantiation). It is called an associated bi-Markov decision model Then we discussed how to find optimal policy under this associated model. Finally an example was given to show that the associated bi-Markov decision process model can economically economize "mind" and is a good tradeoff between the computational validity and computational feasibility.
出处
《计算机科学》
CSCD
北大核心
2009年第9期161-166,共6页
Computer Science
基金
国家自然科学基金(90412014
60803061)
江苏省自然科学基金(BK2008293)资助
关键词
马尔可夫决策过程
增强学习
最优策略
Markov decision processes, Reinforcement learning, Optimal policy