期刊文献+

基于观测的POMDP优化算法及其仿真 被引量:1

An Observation-based Optimization Algorithm for POMDP and Its Simulation
下载PDF
导出
摘要 在分析马尔可夫决策过程(Markov Decision Process,MDP)性能灵敏度的基础上,讨论了部分可观测马尔可夫决策过程(Partially Observable Markov Decision Process,POMDP)的性能优化问题.给出了POMDP性能灵敏度分析公式,并以此为基础提出了两种基于观测的POMDP优化算法:策略梯度优化算法和策略迭代优化算法.最后以准许控制问题为仿真实例,验证了这两个算法的有效性. The problem of performance optimization for partially observable Markov decision process (POMDP) is addressed based on the sensitivity analysis of Markov decision process (MDP). The sensitivity analysis formulas are given. Based on these results, two observation-based optimization algorithms, i.e., policy-gradient and policy-iteration algorithms are developed for POMDP. To verify these algorithms, a simulation based on the problem of admission control is also presented.
出处 《信息与控制》 CSCD 北大核心 2008年第3期346-351,376,共7页 Information and Control
基金 国家自然科学基金(60574065) 国家863计划资助项目(2006AAO1Z114) 安徽省自然科学基金(050420301) 中国科学院自动化所和中国科学技术大学智能科学与技术联合实验室种子基金
关键词 部分可观测马尔可夫决策过程(POMDP) 灵敏度分析 优化 仿真 partially observable Markov decision process (POMDP) sensitivity analysis optimization simulation
  • 相关文献

参考文献8

  • 1Aberdeen D A. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes [D]. Canberra: The Australian National University, 2003.
  • 2Cassandra A R. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes [D]. USA: Brown University, 1998.
  • 3Papadimitriou C H, Tsitsiklis J N. The complexity of Markov decision processes [J]. Mathematics of Operations Research, 1987, 12(3): 441-450.
  • 4Nourbakhsh I R, Powers R, Birchfield S. DERVISH: An officenavigating robot [J]. AI Magazine, 1995, 16(2): 53~60.
  • 5Simmons R, Koenig S. Probabilistic robot navigation in partially observable environments [A]. Proceedings of the International Joint Conference on Artificial Intelligence [C]. San Francisco, CA, USA: Morgan Kaufmann Publishers, 1995. 1080-1087.
  • 6Littman M L, Cassandra A R, Kaelbling L P. Learning policies for partially observable environments: Scaling up [A]. Proceedings of the Twelfth International Conference on Machine Learning [C]. San Francisco, CA, USA: Morgan Kaufmann Publishers, 1995. 362-370.
  • 7Cao X R. Basic ideas for event-based optimization of Markov systerns [J]. Discrete Event Dynamic Systems, 2005, 15(2): 169-197.
  • 8van Dijk N M. Queuing Networks and Product Forms: A Systems Approach [M]. New York, USA: John Wiley and Sons, 1993.

同被引文献16

  • 1Xi-Ren Cao.Basic Ideas for Event-Based Optimization of Markov Systems[J]. Discrete Event Dynamic Systems . 2005 (2)
  • 2X. R. Cao.Single Sample Path-Based Optimization of Markov Chains[J]. Journal of Optimization Theory and Applications . 1999 (3)
  • 3Li Y J,Yin B Q,Xi H S.Partially observable Markovdecision processes and performance sensitivity analysis. IEEE Transactions on Systems,Man,andCybernetics-Part B:Cybernetics . 2008
  • 4H. T. Fang,X. R. Cao.Potential-based online policy iteration algorithms for Markov decision processes. IEEE Transactions on Automatic Control . 2004
  • 5K. W. Ross,D. Tsang.The stochastic knapsack problem. IEEE Transactions on Communications . 1989
  • 6K. W. Ross,D. Tsang.Optimal circuit access policies in an ISDN environment: A Markov decision approach. IEEE Transactions on Communications . 1989
  • 7F.Ma,W.N.Zhou,J.D.Song,G.X.Xu.Research on Admission Control Mechanism in Heterogeneous Network Platform. International Symposium on Intelligence Information Technology Application,IEEE . 2009
  • 8Hui Zhang,Xuming Fang.A Pricing and Game Theory-Based Call Admission Control Scheme for CDMA Systems. International Conference on Wireless Communications, Networking and Mobile Computing . 2007
  • 9B. Praveen,J. Praveen,C. S. R. Murthy.On providing elastic QoS in optical burst switched networks. Proceedings of 2nd International Conference on Broadband Networks . 2005
  • 10Buhler J,Wunder G.Optimal Dynamic Admission Control in Heterogeneous Networks. Proc.of International ITG Workshop on Smart Antennas . 2008

引证文献1

  • 1Fu-Shou Lin 1 Bao-Qun Yin 1,2 Jing Huang 1 Xu-Min Wu 1 1 Key Lab of Anhui Network Communication System and Control, University of Science and Technology of China, Hefei 230027, China 2 National Network New Media Engineering Research Center, Institute of Acoustics, Chinese Academy of Science, Beijing 100190, China.Admission Control with Elastic QoS for Video on Demand Systems[J].International Journal of Automation and computing,2012,9(5):467-473. 被引量:4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部