一种基于特征向量提取的FMDP模型求解方法被引量：3

An Efficient Solution Algorithm for Factored MDP Using Feature Vector Extraction

下载PDF

导出

摘要在诸如机器人足球赛等典型的可分解马尔可夫决策过程(factored Markov decision process,简称FMDP)模型中,不同状态属性在不同的状态下,对于状态评估的影响程度是不同的,其中存在若干关键状态属性,能够唯一或近似判断当前状态的好坏.为了解决FMDP模型中普遍存在的“维数灾”问题,在效用函数非线性的情况下,通过对状态特征向量的提取近似状态效用函数,同时根据对FMDP模型的认知程度,从线性规划和再励学习两种求解角度分别进行约束不等式组的化简和状态效用函数的高维移植,从而达到降低计算复杂度,加快联合策略生成速度的目的.以机器人足球赛任意球战术配合为背景进行实验来验证基于状态特征向量的再励学习算法的有效性和学习结果的可移植性.与传统再励学习算法相比,基于状态特征向量的再励学习算法能够极大地加快策略的学习速度.但更重要的是,还可以将学习到的状态效用函数方便地移植到更高维的FMDP模型中,从而直接计算出联合策略而不需要重新进行学习. In factored Markov decision process (FMDP) such as Robocup system, the effect to value evaluation of various states is different from each other within state attributes. There are some important state attributes that can determine the whole state value either uniquely, or at least, approximately. Instead of using the relevance among states to reduce the state space, this paper addresses the problem of curse of dimensionality in large FMDP by approximating state value function through feature vector extraction. A key contribution of this paper is that it reduces the computation complexity by constraints reduction in linear programming, speeds up the production of joint strategy by transplanting the value function to the more complex game in reinforcement learning. Experimental results are provided on Robocup free kick, demonstrating a promising indication of the efficiency of the approach and its’ ability of transplanting the learning result. Comparing this algorithm to an existing state-of-the-art approach indicates that it can not only improve the learning speed, but also can transplant state value function to the Robocup with more players instead of learning again.

作者张双民石纯一

机构地区清华大学计算机科学与技术系

出处《软件学报》 EI CSCD 北大核心 2005年第5期733-743,共11页 Journal of Software

基金国家自然科学基金国家高技术研究发展计划(863)~~

关键词群体Agent合作求解可分解马尔可夫决策过程线性规划再励学习维数灾 multi-Agent cooperative problem solving factored Markov decision process linear programming reinforcement learning curse of dimensionality

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献14

1Parr K. Policy iteration for factored MDPs. In: Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence (UAI00). Stanford,2000. 326-334. http://ai.stanford.edu/～koller/papers/uai00kp.html
2Parr K. Computing factored value functions for policies in structured MDPs. In: Int'l Joint Conf. on Artificial Intelligence(IJCAI'99). Morgan Kaufmann Publishers, 1999.1332-1339. http://ai.stanford.edu/～koller/papers/ijcai99kp.html
3de Farias R. Approximate dynamic programming via linear programming. In: Advances in Neural Information Processing Systems14. Cambridge: MIT Press, 2002. http://www.core.org.cn/NR/rdonlyres/Mechanical-Engineering/2-997Spring2004/DF5542A5-BBCC-4BAB-ADBF-41AB0FDA6F95/0/most_uhan_slides.pdf
4Guestrin CE, Venkataraman S, Koller D. Context specific multiagent coordination and planning with factored MDPS. In:AAAI-2002 The 18th National Conf. on Artificial Intelligence. Edmonton, 2002. 253-259. http://www-2.cs.cmu.edu/～shobha/research/aaai02.pdf
5Guestrin CE, Koller D, Parr R. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 2003,19:399-468.
6Guestrin CE, Koller D, Parr R. Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems(NIPS-14). Vancouver, 2001. 1523-1530. http://robotics.stanford.edu/～koller/papers/nips01gkp.html
7Guestrin CE, Koller D, Gearhart C, Kanodia N. Generalizing plans to new environments in relational MDPs. In: Int'l Joint Conf. on Artificial Intelligence (IJCAI 2003). Acapulco, 2003. 1003-1010. http://web.engr. oregonstate.edu/～hamann/generalizing_plans_rmdp.pdf
8Sallans B. Reinforcement learning for factored Markov decision processes [Ph.D. Thesis]. Toronto: University of Toronto, 2002.
9Maes S, Tuyls K, Manderick B. Reinforcement learning in large state spaces: Simulated robotic soccer as a testbed. Lecture Notes in Artificial Intelligence, RoboCup 2002. Fukuoka: Springer-Verlag, 2002. http://como.vub.ac.be:8080/Publications/uploads/1/rlrobo02.ps
10Manderick TM. Q-Learning in simulated robotic soccer: Large state spaces and incomplete information. In: Proc. of the ICMLA2002. Las Vegas, 2002. 226-232. http:∥como.vub.ac.be:8080/Publications/uploads/1/icmla02.ps

同被引文献24

1崔晨旸,石教英.三维模型检索中的特征提取技术综述[J].计算机辅助设计与图形学学报,2004,16(7):882-889. 被引量：65
2钱征,孙亮,阮晓钢.一种基于递归神经网络的自适应控制方法研究[J].微计算机信息,2005,21(11S):88-90. 被引量：3
3陈晓云,李荣陆,胡运发.基于最小词频阈值的文档特征选择[J].模式识别与人工智能,2006,19(4):531-537. 被引量：7
4孙麟,牛军钰.基于领域相关词汇提取的特征选择方法[J].小型微型计算机系统,2007,28(5):895-899. 被引量：4
5SUTTON R,BARTO A.Reinforcement learning:An introduction[M].Cambridge,MA:MIT Press,1998.
6MURAO H,KITAMURA S.Q-learning with adaptive state segmentation (QLASS)[C]// Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation(CIRA'97).Washington,DC:IEEE Computer Society,1997:179-184.
7MORALES E P.Relational state abstractions for reinforcement learning[C]// Proceedings of the 21st International Conference on Machine Learning(ICML 2004).New York,NY:ACM Press,2004:27-32.
8MURATA M,OZAWA S.A reinforcement learning algorithm for a class of dynamical environments using neural networks[C]// SICE 2003 Annual Conference.Washington,DC:IEEE Computer Society,2003:2004-2009.
9SEKINO M,KATAGAMI D,NITTA K.State spaces self organization based on the interaction between basis functions[C]// Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS 2005).Washington,DC:IEEE Computer Society,2005:2929-2934.
10ORMONEIT D,GLYNN P.Kernel-based reinforcement learning in average-cost problems[J].IEEE Transactions on Automatic Control,2002,47(10):1624-1636.

引证文献3

1郑宇,罗四维,吕子昂.强化学习算法的稳定状态空间控制[J].计算机应用,2008,28(5):1328-1330.
2肖正,张世永.基于后悔值的多Agent冲突博弈强化学习模型[J].软件学报,2008,19(11):2957-2967. 被引量：5
3王卫玲,初建崇,任颖,张燕红.基于动态融合的三维模型特征选择算法[J].计算机与数字工程,2022,50(6):1259-1262.

二级引证文献5

1柴毅,利节,王嘉骐.基于后悔值的多蚁协作关联强化学习模型[J].系统工程,2010,28(4):64-67. 被引量：1
2刘弘,郑向伟,王吉华.多Agent协同设计系统学习机制[J].兰州大学学报（自然科学版）,2012,48(4):91-97.
3段勇,徐心和.基于多智能体强化学习的多机器人协作策略研究[J].系统工程理论与实践,2014,34(5):1305-1310. 被引量：22
4吴静媛,孙亮,杨树,李岩.面向车路群智协同的运营测试融合体系[J].无线电工程,2022,52(1):53-59. 被引量：2
5罗睿卿,曾坤,张欣景.稀疏异质多智能体环境下基于强化学习的课程学习框架[J].计算机科学,2024,51(1):301-309.

1张双民,石纯一.基于群体Agent合作求解的测试床——MAS-Soccer[J].清华大学学报（自然科学版）,2005,45(4):467-470. 被引量：1
2朱晓琨.基于神经网络的电气设备故障诊断[J].现代电子技术,2009,32(22):130-131. 被引量：4
3张伟,李卫斌.基于OpenFlow网络数据处理模型的研究[J].计算机科学与探索,2015,9(11):1344-1350.
4史少平,庄雷,马丁,胡颖,王国卿.平衡时空的自适应多级流表构建方法[J].计算机工程与设计,2017,38(3):830-836. 被引量：2
5孙强,黄国兴.堆的一种性质的发现和证明[J].计算机工程,2001,27(6):100-101. 被引量：7
6周伟良,孙玲玲.用Authorware演示不等式组的效果[J].电脑知识与技术（过刊）,2003,10(2):83-84.
7王晓伶,慕德俊,刘哲元,袁源.基于可分解MDP模型的MAS协作策略优化及分布执行[J].计算机科学,2009,36(1):39-42.
8吴青,刘三阳,杜喆.基于边界向量提取的模糊支持向量机方法[J].模式识别与人工智能,2008,21(3):332-337. 被引量：13
9高飞,叶尚辉.产品造型的一种设计过程模型[J].西安电子科技大学学报,1995,22(1):68-73.
10唐勇,陈宝峰,张大鹏,陈琛.基于Agent的机器人足球赛中的再励学习算法[J].燕山大学学报,2005,29(4):324-327.

软件学报

2005年第5期

浏览历史

内容加载中请稍等...

一种基于特征向量提取的FMDP模型求解方法被引量：3

参考文献14

同被引文献24

引证文献3

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

一种基于特征向量提取的FMDP模型求解方法 被引量：3

参考文献14

同被引文献24

引证文献3

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

一种基于特征向量提取的FMDP模型求解方法被引量：3