预测状态表示模型的复位算法

An Algorithm for Resetting PSR Models

下载PDF

导出

摘要预测状态表示(Predictive State Representations,PSRs)是用于解决局部可观测问题的有效方法.然而,现实环境中,通过样本学习得到的PSR模型不可能完全准确.随着计算步数的增多,利用PSR模型计算得到的预测向量有可能越来越偏离其真实值,进而导致PSR模型的预测精度越来越低.文中提出了一种PSR模型的复位算法.通过使用判别分析方法确定系统所处的PSR状态,文中所提算法可对利用计算获取的预测向量复位,从而提高PSR模型的准确性.实验结果表明,采用复位算法的PSR模型在预测精度上明显优于未采用复位算法的PSR模型,验证了所提算法的有效性. Predictive State Representations （PSRs） have been proposed as an alternative to partially observable Markov decision processes （POMDPs） to model dynamical systems. Although POMDPs and PSRs provide general frameworks for solving partially observable problems, in real world applications, when the PSR model of a system is learned from samples, it will almost certainly result in an inaccurate PSR model. Therefore the prediction vector calculated using this model may progressively drift farther and farther away from reality, which will result in lower prediction accuracy of the PSR model. This paper describes an algorithm for resetting the learned PSR models. First, for the inaccurate PSR model, the PSR state is identified using discriminant function analysis, then the calculated prediction vector can be reset for the purpose of improving the veracity of the PSR model. The algorithms with and without resetting the PSR model are compared, empirical results show that in case of the obtained PSR model＇s prediction quality, the algorithm with resetting the prediction vector has better prediction accuracy than the algorithm without resetting the prediction vector, which proves the effectiveness of the proposed algorithm.

作者刘云龙吉国力

机构地区厦门大学自动化系

出处《计算机学报》 EI CSCD 北大核心 2012年第5期1046-1051,共6页 Chinese Journal of Computers

基金福建省自然科学基金(2010J05140) 高等学校博士学科点专项科研基金(20100121120022) 国家自然科学基金(60774033)资助~~

关键词预测状态表示模型预测精度复位判别分析预测状态表示模型的准确性 Predictive State Representation （PSR） model prediction accuracy reset discriminant analysis veracity of the Predictive State Representation model

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献13

1Singh S, James M, Rudary M. Predictive state representa- tions: A new theory for modeling dynamical systems//Pro- ceedings of the 20th Conference in Uncertainty in Artificial Intelligence. Banff, Canada, 2004:512-518.
2James M. Using predictions for planning and modeling in sto- chastic environments [Ph. D. dissertation]. University of Michigan, Ann Arbor, USA, 2005.
3McCallum R A. Hidden state and reinforcement learning with instance-based state identification. IEEE Transactions on Systems Man and Cybernetics, Part B: Cybernetics, 1996, 26(3): 464-473.
4Kaelbling L P, Littman M L, Cassandra A R. Planning and acting in partially observable stoehastic domains. Artificial Intelligence, 1998, 101: 99-134.
5LittmanM, Sutton R, Singh S. state//Proceedings of the 2001 ing System(NIPS) Conferenee. 1555-1561 Predictive representations of Neural Information Process- Vancouver, Canada, 2002:.
6James M, Singh S. Learning and discovery of predictive state representations in dynamical Systems with reset//Proceed- ings of the 21st International Conference on Machine Learn- ing. Banff, Canada, 2004:417-424.
7Wolfe B, James M, Singh S. Learning predictive state repre- sentations in dynamical systems without reset//Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany, 2005:980-987.
8刘云龙,李人厚.发现和学习不可复位动态系统的预测状态表示的一种新算法[J].电子学报,2009,37(1):126-131. 被引量：2
9Dinculescu M, Precup D. Approximate predictive representations of partially observable systems//Proceedings of the 27th International Conference on Machine Learning (ICML'10). Haifa, Israel, 2010:985-1002.
10Liu Yunlong, Ji Guoli, Yang Zijiang. Using learned PSR models for planning under uncertainty//Proeeedings of the 23rd Canadian Conference on Artificial Intelligence. Univer- sity of Ottawa, Ontario, Canada, 2010:309-314.

二级参考文献18

1KAELBLING L P, LITTMAN M L. CASSANDRA A R. Planning and acting in partially observable stochastic domains[J].Artificial Intelligence, 1998, 101 (1/2):99-134.
2LITTLEMAN M L, SUTTON R S, SINGH S. Predictive representation of state[M]//Advances in Neural Information Processing Systems 14. Cambridge, MA, USA: MIT Press, 2002:1555-1561.
3SINGH S, JAMES M R, RUDARY M R. Predictive state representations: a new theory for modeling dynamical systems[C]//Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. Alberta,Canada: AUAI Press,512-519.
4MCCALLUM A K. Reinforcement learning with selective perception and hidden state[D]. University of Rochester. Department of Computer Science, 1995.
5JAMES M R, SINGH S. I.earning and discovery of predictive state representations in dynamical systems with reset[C]//Proceedings of the 21st International Conference on Machine Learning. New York, USA ACM, 2004:417-424.
6WOLFE B, JAMES M R, SINGH S. Learning predictive state representations in dynamical systems without reset[C]//Proceeding of the 22nd International Conference on Machine Learning. New York, USA ACM, 2005: 985-992.
7SUTTON R S, BRATO A G. Reinforcement learning:an introduction[M]. Cambridge, MA, USA: MIT Press, 1998.
8A Cassandra. Tony' s POMDP file repository page [OL ]. http://www, cs. brown, edu/research/ai/pomdp/examples / index . html, 2008 - 06 - 02.
9M Bowling, P McCracken, M James, et al. Learning predictive state representations using non-blind policies[ A]. In Proceedings of the Twenty-Third International Conference on Machine Learning [C ]. Pittsburgh, Pennsylvania, USA: ACM, 2006. 129-136.
10S Singh, M Litlman, N Jong, et al. Learning predictive state representations[A ]. In Twentieth International Conference on Machine Learning [C]. Washington, DC, USA: AAAI Press, 2003.712 - 719.

共引文献3

1李建华.网络空间威胁情报感知、共享与分析技术综述[J].网络与信息安全学报,2016,2(2):16-29. 被引量：51
2李炳星,季薇.基于强化学习的分布式智能入侵防御方案设计[J].计算机技术与发展,2019,29(1):118-123.
3于丹宁,倪坤,刘云龙.基于循环卷积神经网络的POMDP值迭代算法[J].计算机工程,2021,47(2):90-94. 被引量：2

1汪庆淼,鞠时光.基于预测状态表示模型和稀疏分布记忆的多观测系统预测[J].计算机应用研究,2012,29(8):2988-2990.
2王历,高阳,王巍巍.预测状态表示综述[J].山东大学学报（工学版）,2010,40(4):23-28. 被引量：1
3刘云龙,李人厚,刘建书.基于预测状态表示的Q学习算法[J].西安交通大学学报,2008,42(12):1472-1475. 被引量：3
4雷珠,刘峰,赵志宏.预测状态表示综述[J].计算机应用研究,2010,27(2):401-404.
5刘云龙,李人厚,刘建书.基于PSR模型的规划算法[J].控制与决策,2009,24(2):289-292. 被引量：1
6吴腾飞,江艳霞,刘子渊,仲思凯.自适应预测压缩跟踪[J].光电工程,2016,43(12):92-98.
7汪庆淼,鞠时光.基于预测状态表示的多变量概率系统预测[J].计算机应用,2012,32(11):3044-3046. 被引量：2
8刘云龙,李人厚.发现和学习不可复位动态系统的预测状态表示的一种新算法[J].电子学报,2009,37(1):126-131. 被引量：2
9卢紫微,吴成东,陈东岳,于晓升.基于空间相关预测的快速块匹配运动估计算法[J].东北大学学报（自然科学版）,2014,35(4):470-473. 被引量：1
10李国勇,谢克明.隐式广义预测自校正控制算法的混合仿真研究[J].系统仿真学报,1999,11(3):157-160. 被引量：14

计算机学报

2012年第5期

浏览历史

内容加载中请稍等...

预测状态表示模型的复位算法

参考文献13

二级参考文献18

共引文献3

相关作者

相关机构

相关主题

浏览历史