摘要
针对现有Markov决策过程自适应决策方法仅研究无限规划水平自适应决策的不足,提出了一种有限规划水平Markov决策过程自适应决策算法.算法的基本思想是运用Bayes理论对未知系统进行“学习”,并且在每次决策时以最大概率保证实际决策为最优决策.最后用仿真结果表明了算法的有效性.
An algorithm is proposed for adaptive MDP with finite planning horizon by reason of the fact that all current algorithms only consider adaptive MDP with infinite planning horizon. Bayes principle is applied to learn an unknown system; and for every decision the probability that the actual decision equals the optimal decision is maximized. Simulation results demonstrate the validity of the new algorithm.
出处
《应用科学学报》
CAS
CSCD
2000年第4期335-339,共5页
Journal of Applied Sciences
基金
国家自然科学基金资助项目!(69874025)