摘要
提出了一种有限规划水平部分可观、不确定 Markov决策过程自适应决策算法 .算法的基本思想是运用 Bayes理论对未知系统进行“学习”,通过最小决策失误概率的参数决策实现参数估计 ,在参数估计的基础上进行控制决策从而以最大概率实现最优决策 .文中证明了决策算法的收敛性 .仿真结果表明了决策算法的有效性 .
An algorithm was proposed for adaptive POMDP with finite planning horizon. In the algorithm, Bayes method is used to learn the unknown system, and the principle of minimum decision error probability is applied for parameter estimation. The control is obtained based on estimated parameter so that the probability that every decision being optimal is maximized. The convergence of the algorithm was proved and the effectiveness of the algorithm was demonstrated by the simulation.
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
2000年第12期1653-1657,共5页
Journal of Shanghai Jiaotong University
基金
国家自然科学基金资助项目! (6 98740 2 5 )
关键词
部分可观Markov决策过程
自适应控制
贝叶斯原理
Adaptive control systems
Learning algorithms
Markov processes
Optimization
Parameter estimation