基于核方法的强化学习算法被引量：1

A Kernel Based Reinforcement Learning Algorithm

下载PDF

导出

摘要传统的强化学习算法通常假设状态空间和行动空间是离散的,而实际上很多问题的状态空间是连续的,这就大大地限制了强化学习在实际中的应用。为克服以上不足,本文提出了一种基于核方法的强化学习算法,能直接处理具有连续状态空间的问题。最后,通过具有连续状态空间和离散行动空间的mountain car问题来验证算法。实验表明,这种算法在处理具有连续状态空间的问题时,和传统的先把状态空间离散化的方法相比,能以较少的训练数据收敛到更好的策略。 Traditional Reinforcement Learning algorithms usually assume discrete states and actions, however, many tasks inherently have continuous state spaces, which limits the practical use of reinforcement learning largely. In this paper, we develop a kernel based reinforcement learning algorithm, which solve the problems with continuous state spaces directly. Finally, we illustrated the algorithm by solving the classical mountain car task. Experiments show that the algorithm converges to good policies with relati...

作者何源张文生

机构地区中国科学院自动化研究所

出处《微计算机信息》北大核心 2008年第4期243-245,共3页 Control & Automation

基金国家基础研究计划973项目名称:机器学习与数据描述编号:2004CB318103

关键词强化学习核方法马尔科夫决策过程 Q-LEARNING MOUNTAIN CAR reinforcement learning kernel markov decision process Q-learning mountain-car

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献7

1[1]Sutton,R.S.and Barto,A.G,Reinforcement learning:An Introduction.Cambridge,MA MIT Press,1998.
2叶德谦,金大兵,杨樱.基于强化学习的股票预测系统的研究与设计[J].微计算机信息,2006,22(02X):149-151. 被引量：4
3[3]S.ingh and D.Bertsekas.Reinforcement learning for dynamic channel allocation in cellular telephone systems.In M.C.Mozer,M..Jordan,and T.etsche,editors,Advances in Neural Information Processing Systems,volume 9,page 974.The MIT Press,1997.
4[4]G.Tesauro,Neurogammon wins computer Olympiad.Neural Computation,1(3):321-323,1989.
5[5]Ormnoneit,.and Sen.S,Kernel-based Reinforcement Learning.Machine Learning,49,161-178,2002.
6[6]J.Fan and I.Gijbels.Local Polynomial Modeling and Its Applications.Champman and Hall,1996
7[7]Nicholas K.ong and Peter Stone,Kernel-Based Models for Reinforcement Learning,Kernel Machines for Reinforcement LearningWorkshop,Pittsburgh,PA,2006.

二级参考文献2

1蒋忠仁.金融网络中的SET技术[J].微计算机信息,2005,21(09X):24-26. 被引量：7
2赵宏邹,雯汪浩.证券市场预测的神经网络方法[J].系统工程理论与实践,1997,17(6):127-131. 被引量：29

共引文献3

1郭淑红.基于Apriori算法的股票分析仿真系统[J].计算机仿真,2010,27(6):334-337. 被引量：15
2刘井莲,赵卫绩,文海霞.基于关联规则的股票分析软件的设计与实现[J].通化师范学院学报,2012,33(8):30-32. 被引量：4
3赵婷婷,韩雅杰,杨梦楠,任德华,陈亚瑞,王嫄,刘建征.基于机器学习的时序数据预测方法研究综述[J].天津科技大学学报,2021,36(5):1-9. 被引量：25

同被引文献31

1陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量：13
2高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量：13
3Su S, Lee Z, Wang Y. Robust and fast learning for fuzzy cerebellar model articulation controllers. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics,2006,36(1):203-208.
4Ernst D, Geurts P, Wehenkel L. Tree-Based Batch Mode Reinforcement Learning. Journal of Machine Learning Research,2006,6(1): 503-556.
5Christopher Kenneth Monson. Reinforcement learning in the joint space: value iteration in worlds with continuous states and actions. Master of Science, Brigham Young University, 2003.
6Baird L. Residual algorithm: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Confer- ence on Machine Learning, Morgan Kaufmann, 1995:30-37.
7Remi Munos, Andrew Moore Variable Resolution Discretization in Optimal Control. Machine Learning, 2002, 49: 291-323.
8Shimon Whiteson, Peter Stone. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 2006, 7: 877-917.
9Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Athena Scientific: Belmont, MA, 1996.
10Jason P, Lagoudakis Michail G. Learning continuous action control policies. 2009 IEEE Symposium on Adaptive Dynamic Program- ming and Reinforcement Learning, 2009:169-176.

引证文献1

1夏丽丽.连续状态-连续行动强化学习[J].电脑知识与技术,2011,7(7):4669-4672. 被引量：2

二级引证文献2

1任志鸿.基于象棋博弈问题中评价函数的分析与探讨[J].电脑与信息技术,2012,20(6):26-27.
2程鹏,谢小年.基于BP神经网络的Q-学习可变限速控制对拥堵路段交通流的优化[J].山东交通学院学报,2017,25(3):38-43. 被引量：2

1数字[J].电脑迷,2012(8):5-5.
2微博上在传什么？[J].计算机应用文摘,2012(7):91-91.
3丰台顽石.Mountain Lion：OSX向iOS看齐[J].微型计算机,2012(14):57-59.
4徐悫.GE Fanuc购买Mountain Systems[J].国内外机电一体化技术,2004,7(1):2-2.
5三星Blue Mountain多功能打印机即将上市[J].消费电子,2010(8):86-86.
6小异而大同：“Windows 8 CP”VS.“Mountain Lion”[J].数码精品世界,2012(4):52-57.
7沈建苗.桌面与移动融合,谁更出色?——Windows 8与Mountain Lion操作系统深入解析[J].微电脑世界,2012(6):17-19.
8电脑电视机－Silicon Mountain Allio[J].车主之友,2009(3):215-215.
9辛丽玲,何威,于剑,贾彩燕.一种基于密度差异的离群点检测算法[J].山东大学学报（工学版）,2015,45(3):7-14. 被引量：2
10eWeek:苹果新操作系统打造小企业主福音[J].电脑与电信,2012(1):11-11.

微计算机信息

2008年第4期

浏览历史

内容加载中请稍等...

基于核方法的强化学习算法被引量：1

参考文献7

二级参考文献2

共引文献3

同被引文献31

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于核方法的强化学习算法 被引量：1

参考文献7

二级参考文献2

共引文献3

同被引文献31

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于核方法的强化学习算法被引量：1