一种最大集合期望损失的多目标Sarsa(λ)算法被引量：3

A Multiple-Goal Sarsa(λ) Algorithm Basedon Lost Reward of Greatest Mass

下载PDF

导出

摘要针对RoboCup这一典型的多目标强化学习问题,提出一种基于最大集合期望损失的多目标强化学习算法LRGM-Sarsa(λ)算法.该算法预估各个目标的最大集合期望损失,在平衡各个目标的前提下选择最佳联合动作以产生最优联合策略.在单个目标训练的过程中,采用基于改进MSBR误差函数的Sarsa(λ)算法,并对动作选择概率函数和步长参数进行优化,解决了强化学习在使用非线性函数泛化时,算法不稳定、不收敛的问题.将该算法应用到RoboCup射门局部策略训练中,取得了较好的效果,表明该学习算法的有效性. For solving the multiple-goal problem in RoboCup,a novel multiple-goal Reinforcement Learning algorithm,named LRGM-Sarsa（λ）,is proposed.The algorithm estimates the lost reward of the greatest mass of every sub goal and trades off the long term reward of the sub goals to get a composite policy.In the single learning module,B error function,which is based on MSBR error function is proposed.B error function has guaranteed the convergence of the value prediction with the non-linear function approximation.The probability funciton of selecting actions and the parameter α are also improved with respect to B error function.This algorithm is applied to the training of shooting in Robocup 2D.The experimental results show that the proposed algorithm is more stable and converges faster.

作者刘全李瑾傅启明崔志明伏玉琛

机构地区苏州大学计算机与科学学院符号计算与知识工程教育部重点实验室(吉林大学)

出处《电子学报》 EI CAS CSCD 北大核心 2013年第8期1469-1473,共5页 Acta Electronica Sinica

基金国家自然科学基金(No.61070223 No.61103045 No.61272005 No.61170020) 江苏省自然科学基金(No.BK2012616) 江苏省高校自然科学研究项目(No.09KJA520002 No.09KJB520012) 吉林大学符号计算与知识工程教育部重点实验室项目(No.93K172012K04)

关键词多目标自适应Sarsa(λ) 最大集合期望损失强化学习机器人足球 multiple-goal adaptive Sarsa（λ） lost reward of greatest mass reinforcement learning robocup 2D

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1Fong Wu, Shlomo 2hilberst~in, Xiaoping Chen. Online planning for mulfi-agont systems with bounded communication[ J]. Arfif- ical Intelligence, 2011,175(2) :487 - 511.
2zha~, Xiaoping Chert. hc~lerating point-has~ POMDP algorithms via greedy sWategies [ A]. Proceedings of the 2nxi International Conference on Simulalion,Modding and Programming for Autonorat~ Robots [ C ]. Danmtadt, Ger- many:Computer Science,2010.545- 556.
3Feng Wu, Shlo~ao Zilberstein, XiaoPing Chert. Multi-agent on- line planning with communication[ A]. Proceedings of ICAPS- 09[C]. Ihessaloniki, Greece: AAAI, 2009.
4Ctmstanlin A,Dana H. Q assignment in multiple goal em- bodied visu~tlotor behavior[ J]. Frontiers in Psychology, 2010, 173(1) :1 - 13.
5Marek Grzes, Daniel Kuadenko. Multigrid reinf~r.enmat learn- ing with reward shaping[A] .Proceedings of ICANN 2008[C]. Verlag, Berlin: Springer, 2008.357- 366.
6王雪松,张依阳,程玉虎.基于高斯过程分类器的连续空间强化学习[J].电子学报,2009,37(6):1153-1158. 被引量：11
7Karlsson J. Learning to Solve Multiple Goals[ D]. Rochester:University of Rochester, 1997.
8Humphrys M. Action selection meflxxts using t~foteeng~t learning[ A]. Proceedings of the Fourth International Confer- ence on Simulation of Adaptive Behavior [ C ]. Cambridge, MA:MIT, 1996.135 - 144.
9Daw N, Doherty J, et al. Cortical substrates for exploratory de- risions in humans[J]. Nanae,2006,441 (7095) :876 - 879.
10Rangel A, Hare T. Neural cxantm~ons associated with goal- dir~ted choice[J]. Oment opinion in neurobiology,2010,20 (6) :262- 270.

二级参考文献22

1秦斌,吴敏,王欣,阳春华.基于多智能体强化学习的焦炉集气管压力多级协调控制[J].电子学报,2006,34(10):1847-1851. 被引量：3
2V Vapnik. The Nature of Statistical Learning Theory[M]. New York:Springer Verlag, 1995.
3R Goto, H Matsuo. State generalization method with support vector machines in reinforcement learning [ J ]. Systems and Computers in Japan,2006,37(9):77 - 86.
4Xuesong Wang, Xilan Tian, Yuhu Cheng. Value approximation with least squares support vector machine in reinforcement learning system[ J ]. Joturnal of Computational and Theoretical Nanoscience, 2007,4( 7/8 ) : 1290 - 1294.
5G Kai, H Barbara. A reinforcement learning algorithm to improve scheduling search heuristics with the SVM[ A]. Proceedings of the IEEE International Conference on Neural Networks [C]. Piscataway: IEEE, Inc,2004.1811 - 1816.
6R Manimegalai,E S Soumya, V Muralidharan, et al.Placement and muting for 3D-FPGAs using reinforcement learning and support vector machines [ A ]. Proceedings of the International Conference on VLSI Design[ C]. Piscataway: IEEE Inc, 2005. 451 - 456.
7C K I Williams, D Barber. Bayesian classification with Gaussian processes [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(12) : 1342 - 1351.
8M Kyriakos, P Dimitris. Continuous nearest neighbor queries over sliding windows [ J ]. IEEE Transactions on Knowledge and Data Engineering,2007,19(6) :789 - 803.
9C E Rasmussen, C K I Williams. G-aussian Processes for Machine Learning[M] .USA:MIT Press,2006.
10L Jouffe. Fuzzy inference system learning by reinforcement methods[J]. IEEE Transactions on System, Man and Cybernetics, 1998,28(3) :338 - 355.

共引文献22

1刘国栋,杨宝庆.多智能体的增强学习及其在RoboCup中的应用[J].计算机工程与应用,2008,44(23):46-48.
2张捍东,吴玉秀,岑豫皖.多机器人合作与协调研究进展[J].计算机工程与应用,2008,44(24):238-241. 被引量：4
3王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219. 被引量：20
4柴毅,利节,王嘉骐.基于后悔值的多蚁协作关联强化学习模型[J].系统工程,2010,28(4):64-67. 被引量：1
5陈玉明,张广明,赵英凯.基于混合Q学习的多Agent系统[J].制造业自动化,2010,32(9):61-63.
6柯文德,朴松昊,彭志平,蔡则苏,苑全德.基于π演算的足球机器人协作Q学习方法[J].计算机应用,2011,31(3):654-656. 被引量：4
7刘峤,王娟,陈伟,秦志光.基于随机复杂度约束的高维特征自动选择算法[J].电子学报,2011,39(2):370-374. 被引量：1
8夏丽丽.连续状态-连续行动强化学习[J].电脑知识与技术,2011,7(7):4669-4672. 被引量：2
9吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
10李文书,何芳芳,钱沄涛,周昌乐.基于Adaboost-高斯过程分类的人脸表情识别[J].浙江大学学报（工学版）,2012,46(1):79-83. 被引量：14

同被引文献119

1Laura RAY.Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning[J].控制理论与应用（英文版）,2011,9(3):440-450. 被引量：2
2陈宁,谈英姿,许映秋.多种评价指标下RoboCupRescue救援智能体算法[J].东南大学学报（自然科学版）,2009,39(S1):105-110. 被引量：2
3刘海涛,洪炳熔,朴松昊,王雪梅.不确定性环境下基于进化算法的强化学习[J].电子学报,2006,34(7):1356-1360. 被引量：12
4高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量：13
5郭锐,吴敏,彭军,彭姣,曹卫华.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4):367-372. 被引量：13
6Ross S,Pineau J,Chaib-draa B,et al.A Bayesian approach for learning and planning in partially observable Markov decision processes[J].Journal of Machine Learning Research,2011,12(1):1729-1770.
7Doshi-VelezF,Pineau J,Roy N.Reinforcement learning with limited reinforcement:Using Bayes risk for active learning in POMDPs[J].Artificial Intelligence,2012,187-188(1):115-132.
8Poupart P,Vlassis N.Model-based Bayesian reinforcement learning in partially observable domains[A].Proceedings of the International Joint Conference on Autonomous Agents and Multi Agent Systems[C].New York:ACM Press,2008.1025-1032.
9Ross S,Pineau J.Model-based Bayesian reinforcement learning in large structured domains[A].Proceedings of the 24th conference annual conference on uncertainty in artificial intelligence[C].Cambridge,MA:AUAI Press,2008.476-483.
10Poupart P,Vlassis N,Hoey J,et al.An analytic solution to discrete Bayesian reinforcement learning[A].Proceedings of the 23rd international conference on Machine learning[C].New York:ACM Press,2006.697-704.

引证文献3

1仵博,郑红燕,冯延蓬,陈鑫.一种基于模型的可分解贝叶斯在线强化学习[J].电子学报,2014,42(7):1429-1434. 被引量：2
2马磊,张文旭,戴朝华.多机器人系统强化学习研究综述[J].西南交通大学学报,2014,49(6):1032-1044. 被引量：14
3赵辉,赵玉峰.一种改进的多智能体Q学习算法[J].自动化与仪器仪表,2017(4):25-27. 被引量：5

二级引证文献21

1徐雪松,曾智,邵红燕,杨胜杰,李想.基于个体-协同触发强化学习的多机器人行为决策方法[J].仪器仪表学报,2020(5):66-75. 被引量：11
2赵元,张合新.基于目标状态距离简化Q-learning算法的迷宫路径规划[J].火箭军工程大学学报,2019(4):79-84.
3渠国庆,熊峰,李军世,牛倩.基于多Agent可重构装配系统结构研究[J].工业控制计算机,2016,29(7):135-136.
4王卫民,储美玉,王晓进.基于强化学习的话务调度新模型[J].信息技术,2016,40(9):130-133.
5朱斐,刘全,傅启明,陈冬火,王辉,伏玉琛.一种不稳定环境下的策略搜索及迁移方法[J].电子学报,2017,45(2):257-266. 被引量：3
6薛天.深度强化学习原理及其在机器人运动控制中的运用[J].通讯世界,2018,25(8):240-241. 被引量：3
7高乐,马天录,刘凯,张宇轩.改进Q-Learning算法在路径规划中的应用[J].吉林大学学报（信息科学版）,2018,36(4):439-443. 被引量：17
8张文旭,马磊,贺荟霖,王晓东.强化学习的地–空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(2):202-207. 被引量：7
9殷国栋,朱侗,任祖平,李广民,金贤建.基于多Agent的电动汽车底盘智能控制系统框架[J].中国机械工程,2018,29(15):1796-1801. 被引量：8
10武子睿.浅析人工智能主要技术方向以及在智能机器人上的应用[J].电子制作,2018,26(20):36-38.

1马朋委,潘地林.基于启发函数改进的SARSA(λ)算法[J].计算机与数字工程,2016,44(5):825-828. 被引量：2
2李新磊.基于依赖型任务和Sarsa(λ)算法的云计算任务调度[J].计算机测量与控制,2015,23(8):2809-2812. 被引量：1
3陈焕文,谢丽娟.折扣与无折扣MDPs:一个基于SARSA(λ)算法的实例分析[J].计算机工程与应用,2002,38(9):86-88.
4肖飞,刘全,傅启明,孙洪坤,高龙.基于自适应势函数塑造奖赏机制的梯度下降Sarsa(λ)算法[J].通信学报,2013,34(1):77-88. 被引量：6
5尹国成,张德干,朱红艳,赵海.基于熵模型的自适应信息融合方法[J].东北大学学报（自然科学版）,2002,23(3):232-235. 被引量：5
6童亮,陆际联,龚建伟.一种快速强化学习方法研究[J].北京理工大学学报,2005,25(4):328-331. 被引量：4
7李春贵,阳树洪,王萌,张增芳.基于SARSA(λ)算法的单路口交通信号学习控制[J].广西工学院学报,2008,19(2):10-14. 被引量：3
8王军红,江虹,黄玉清,伍晓利.基于RPkNN-Sarsa(λ)强化学习的机器人路径规划方法[J].计算机应用研究,2013,30(1):199-201. 被引量：4
9林正红,江虹,张娟,徐冠军.基于POMDP的跨层机会频谱接入优化设计[J].计算机工程,2014,40(2):114-118. 被引量：1
10薛丽华,殷苌茗,李立云,胡明辉.基于多智能体的融合Sarsa(λ)学习算法[J].计算机工程与应用,2008,44(4):182-183. 被引量：2

电子学报

2013年第8期

浏览历史

内容加载中请稍等...

一种最大集合期望损失的多目标Sarsa(λ)算法被引量：3

参考文献16

二级参考文献22

共引文献22

同被引文献119

引证文献3

二级引证文献21

相关作者

相关机构

相关主题

浏览历史

一种最大集合期望损失的多目标Sarsa(λ)算法 被引量：3

参考文献16

二级参考文献22

共引文献22

同被引文献119

引证文献3

二级引证文献21

相关作者

相关机构

相关主题

浏览历史

一种最大集合期望损失的多目标Sarsa(λ)算法被引量：3