期刊文献+

一种基于特征向量提取的FMDP模型求解方法 被引量:3

An Efficient Solution Algorithm for Factored MDP Using Feature Vector Extraction
下载PDF
导出
摘要 在诸如机器人足球赛等典型的可分解马尔可夫决策过程(factored Markov decision process,简称FMDP)模型中,不同状态属性在不同的状态下,对于状态评估的影响程度是不同的,其中存在若干关键状态属性,能够唯一或近似判断当前状态的好坏.为了解决FMDP模型中普遍存在的“维数灾”问题,在效用函数非线性的情况下,通过对状态特征向量的提取近似状态效用函数,同时根据对FMDP模型的认知程度,从线性规划和再励学习两种求解角度分别进行约束不等式组的化简和状态效用函数的高维移植,从而达到降低计算复杂度,加快联合策略生成速度的目的.以机器人足球赛任意球战术配合为背景进行实验来验证基于状态特征向量的再励学习算法的有效性和学习结果的可移植性.与传统再励学习算法相比,基于状态特征向量的再励学习算法能够极大地加快策略的学习速度.但更重要的是,还可以将学习到的状态效用函数方便地移植到更高维的FMDP模型中,从而直接计算出联合策略而不需要重新进行学习. In factored Markov decision process (FMDP) such as Robocup system, the effect to value evaluation of various states is different from each other within state attributes. There are some important state attributes that can determine the whole state value either uniquely, or at least, approximately. Instead of using the relevance among states to reduce the state space, this paper addresses the problem of curse of dimensionality in large FMDP by approximating state value function through feature vector extraction. A key contribution of this paper is that it reduces the computation complexity by constraints reduction in linear programming, speeds up the production of joint strategy by transplanting the value function to the more complex game in reinforcement learning. Experimental results are provided on Robocup free kick, demonstrating a promising indication of the efficiency of the approach and its’ ability of transplanting the learning result. Comparing this algorithm to an existing state-of-the-art approach indicates that it can not only improve the learning speed, but also can transplant state value function to the Robocup with more players instead of learning again.
出处 《软件学报》 EI CSCD 北大核心 2005年第5期733-743,共11页 Journal of Software
基金 国家自然科学基金 国家高技术研究发展计划(863)~~
关键词 群体Agent合作求解 可分解马尔可夫决策过程 线性规划 再励学习 维数灾 multi-Agent cooperative problem solving factored Markov decision process linear programming reinforcement learning curse of dimensionality
  • 相关文献

参考文献14

  • 1Parr K. Policy iteration for factored MDPs. In: Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence (UAI00). Stanford,2000. 326-334. http://ai.stanford.edu/~koller/papers/uai00kp.html
  • 2Parr K. Computing factored value functions for policies in structured MDPs. In: Int'l Joint Conf. on Artificial Intelligence(IJCAI'99). Morgan Kaufmann Publishers, 1999.1332-1339. http://ai.stanford.edu/~koller/papers/ijcai99kp.html
  • 3de Farias R. Approximate dynamic programming via linear programming. In: Advances in Neural Information Processing Systems14. Cambridge: MIT Press, 2002. http://www.core.org.cn/NR/rdonlyres/Mechanical-Engineering/2-997Spring2004/DF5542A5-BBCC-4BAB-ADBF-41AB0FDA6F95/0/most_uhan_slides.pdf
  • 4Guestrin CE, Venkataraman S, Koller D. Context specific multiagent coordination and planning with factored MDPS. In:AAAI-2002 The 18th National Conf. on Artificial Intelligence. Edmonton, 2002. 253-259. http://www-2.cs.cmu.edu/~shobha/research/aaai02.pdf
  • 5Guestrin CE, Koller D, Parr R. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 2003,19:399-468.
  • 6Guestrin CE, Koller D, Parr R. Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems(NIPS-14). Vancouver, 2001. 1523-1530. http://robotics.stanford.edu/~koller/papers/nips01gkp.html
  • 7Guestrin CE, Koller D, Gearhart C, Kanodia N. Generalizing plans to new environments in relational MDPs. In: Int'l Joint Conf. on Artificial Intelligence (IJCAI 2003). Acapulco, 2003. 1003-1010. http://web.engr. oregonstate.edu/~hamann/generalizing_plans_rmdp.pdf
  • 8Sallans B. Reinforcement learning for factored Markov decision processes [Ph.D. Thesis]. Toronto: University of Toronto, 2002.
  • 9Maes S, Tuyls K, Manderick B. Reinforcement learning in large state spaces: Simulated robotic soccer as a testbed. Lecture Notes in Artificial Intelligence, RoboCup 2002. Fukuoka: Springer-Verlag, 2002. http://como.vub.ac.be:8080/Publications/uploads/1/rlrobo02.ps
  • 10Manderick TM. Q-Learning in simulated robotic soccer: Large state spaces and incomplete information. In: Proc. of the ICMLA2002. Las Vegas, 2002. 226-232. http:∥como.vub.ac.be:8080/Publications/uploads/1/icmla02.ps

同被引文献24

  • 1崔晨旸,石教英.三维模型检索中的特征提取技术综述[J].计算机辅助设计与图形学学报,2004,16(7):882-889. 被引量:65
  • 2钱征,孙亮,阮晓钢.一种基于递归神经网络的自适应控制方法研究[J].微计算机信息,2005,21(11S):88-90. 被引量:3
  • 3陈晓云,李荣陆,胡运发.基于最小词频阈值的文档特征选择[J].模式识别与人工智能,2006,19(4):531-537. 被引量:7
  • 4孙麟,牛军钰.基于领域相关词汇提取的特征选择方法[J].小型微型计算机系统,2007,28(5):895-899. 被引量:4
  • 5SUTTON R,BARTO A.Reinforcement learning:An introduction[M].Cambridge,MA:MIT Press,1998.
  • 6MURAO H,KITAMURA S.Q-learning with adaptive state segmentation (QLASS)[C]// Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation(CIRA'97).Washington,DC:IEEE Computer Society,1997:179-184.
  • 7MORALES E P.Relational state abstractions for reinforcement learning[C]// Proceedings of the 21st International Conference on Machine Learning(ICML 2004).New York,NY:ACM Press,2004:27-32.
  • 8MURATA M,OZAWA S.A reinforcement learning algorithm for a class of dynamical environments using neural networks[C]// SICE 2003 Annual Conference.Washington,DC:IEEE Computer Society,2003:2004-2009.
  • 9SEKINO M,KATAGAMI D,NITTA K.State spaces self organization based on the interaction between basis functions[C]// Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS 2005).Washington,DC:IEEE Computer Society,2005:2929-2934.
  • 10ORMONEIT D,GLYNN P.Kernel-based reinforcement learning in average-cost problems[J].IEEE Transactions on Automatic Control,2002,47(10):1624-1636.

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部