有限规划水平部分可观Markov自适应决策过程的参数决策

Parameter Decision in Adaptive Partially Observable Markov Decision Process with Finite Planning Horizon

下载PDF

导出

摘要提出了一种有限规划水平部分可观、不确定 Markov决策过程自适应决策算法 .算法的基本思想是运用 Bayes理论对未知系统进行“学习”,通过最小决策失误概率的参数决策实现参数估计 ,在参数估计的基础上进行控制决策从而以最大概率实现最优决策 .文中证明了决策算法的收敛性 .仿真结果表明了决策算法的有效性 . An algorithm was proposed for adaptive POMDP with finite planning horizon. In the algorithm, Bayes method is used to learn the unknown system, and the principle of minimum decision error probability is applied for parameter estimation. The control is obtained based on estimated parameter so that the probability that every decision being optimal is maximized. The convergence of the algorithm was proved and the effectiveness of the algorithm was demonstrated by the simulation.

作者李江红韩正之

机构地区上海交通大学智能工程研究所

出处《上海交通大学学报》 EI CAS CSCD 北大核心 2000年第12期1653-1657,共5页 Journal of Shanghai Jiaotong University

基金国家自然科学基金资助项目! (6 98740 2 5 )

关键词部分可观Markov决策过程自适应控制贝叶斯原理 Adaptive control systems Learning algorithms Markov processes Optimization Parameter estimation

分类号 TP [自动化与计算机技术] 202.4

引文网络
相关文献

参考文献10

1[1]Wallace J H, Yar-Lin Kuol. An optimal structured policy for maintenance of partially observable aircraft engine components [J]. Naval Research Logistics,1998,45(4) :335～352.
2[2]Nacy Gautreau, Soumaya Yacout, Rejean Hall. Simulation of partially observed Markov decision process and dynamic quality improvement[J]. Computers Ind Engng, 1997,32(4) :691 ～700.
3[3]Hernandez Lerma O. Marcus S I. Adaptive control of Markov processes with incomplete state inform tion and unknown parameters [J]. Journal of Optimization Theory and Applications, 1987,52 (2): 227～241.
4[4]Fernandez Gaucherand E. A methodology for the adaptive control of Markov chains under partial state information[A]. Proceedings of the 31st IEEE Decision and Control[C],1992.2 750～2 751.
5[5]Fernandez Gaucherand E, Arapostathis A, Marcus S I. Analysis of an adaptive control scheme for a par tially observed controlled Markov chain[J]. IEEE Transactions on Automatic Control, 1993,38(6): 987～993.
6[6]Monahan G E. A survey of partially observable Markov decision process: theory, models, and alogrithms[J]. Management Science,1982, 28 (1) : 1～16.
7[7]Sondic E, Offensend F. The optimal control of partially observable Markov processes over a finite horizon [J]. Operation Research,1973,21(5):1 071～1 088.
8[8]Melsa J L, Cohn D L. Decision and estimation theory[M]. New York: McGraw-Hill Book Company,1978.96～ 110.
9李江洪,韩正之.有限规划水平自适应Markov决策过程的参数决策[J].应用科学学报,2000,18(4):335-339. 被引量：1
10[11]Doob J L. Stochastic Processes[M]. New York: John Wiley, 1953.

二级参考文献1

1言茂松，贝叶斯风险决策工程，1989年，31页

1韦碧鹏,吕跃进,李大林.不协调序目标信息系统的属性约简[J].统计与决策,2014,30(3):49-52.
2孙燕,武书彦,刘久富,刘文渊,刘海洋,杨忠.高铁进出站控制系统的Petri网故障诊断研究[J].广西大学学报（自然科学版）,2016,41(2):535-540. 被引量：3
3刘久富,刘文良,周建勇,刘海阳,王志胜,刘春生.改进的部分可观Petri网系统在线故障诊断器设计[J].控制理论与应用,2015,32(7):866-872. 被引量：6
4曹浩,殷保群,曹杰,陆效农.基于软件定义网络的媒体分发网络的接入控制[J].计算机应用,2016,36(7):1767-1771. 被引量：2
5方欢,陆阳,岳峰,官骏鸣.实现故障无二义诊断的部分可观系统设计方法[J].系统仿真学报,2015,27(3):470-479.
6王媛媛.智能决策支持系统中基于粗糙集的知识推理[J].硅谷,2008,1(18).
7戴士杰,任卫国,管啸天,侯建英,常淑英.基于视觉和粗糙集理论的跟踪系统研究[J].微计算机信息,2008,24(30):293-295.
8汪耒,林福寿,殷保群.基于POMDP的流媒体网络数据调度建模与仿真[J].中国科学技术大学学报,2013,43(4):295-299. 被引量：1
9西安交通大学整合资源搭建高性能网络支撑平台[J].教育信息化,2004(11):18-19.
10彭晓红,刘文良,于杰,孙燕,刘文渊,刘海阳,鲍建成,刘久富.部分可观Petri网系统的在线故障诊断方法[J].城市轨道交通研究,2016,19(12):6-12.

上海交通大学学报

2000年第12期

浏览历史

内容加载中请稍等...

有限规划水平部分可观Markov自适应决策过程的参数决策

参考文献10

二级参考文献1

相关作者

相关机构

相关主题

浏览历史