Fuzzy Q-learning in continuous state and action space

Fuzzy Q-learning in continuous state and action space

导出

摘要 An adaptive fuzzy Q-leaming （AFQL） based on fuzzy inference systems （FIS） is proposed. The FIS realized by a normalized radial basis function （NRBF） neural network is used to approach Q-value function, whose input is composed of state and action. The rules of FIS are created incrementally according to the novelty of each element of the pair of state-action. Moreover the premise part and consequent part of the FIS are updated using extended Kalman filter （EKF）. The action that impacts on environment is the one with maximum output of FIS in the current state and generated through optimization method. Simulation results in the wall-following task of mobile robots and the inverted pendulum balancing problem demonstrate that the superiority and applicability of the proposed AFQL method. An adaptive fuzzy Q-leaming （AFQL） based on fuzzy inference systems （FIS） is proposed. The FIS realized by a normalized radial basis function （NRBF） neural network is used to approach Q-value function, whose input is composed of state and action. The rules of FIS are created incrementally according to the novelty of each element of the pair of state-action. Moreover the premise part and consequent part of the FIS are updated using extended Kalman filter （EKF）. The action that impacts on environment is the one with maximum output of FIS in the current state and generated through optimization method. Simulation results in the wall-following task of mobile robots and the inverted pendulum balancing problem demonstrate that the superiority and applicability of the proposed AFQL method.

作者 XU Ming-liang XU Wen-bo

机构地区 Department of Electronic Information Engineering School of Information Technology

出处《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2010年第4期100-109,共10页 中国邮电高校学报（英文版）

基金 supported by the National Natural Science Foundation of China (60703106)

关键词 Q-LEARNING FIS CONTINUOUS ADAPTATION Q-learning, FIS, continuous, adaptation

分类号 TP273.4 [自动化与计算机技术—检测技术与自动化装置] O213.2 [理学—概率论与数理统计]

引文网络
相关文献

参考文献27

1Hwang K S, Tan S W, Tsai M C. Reinforcement learning to adaptive control of nonlinear systems. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 2003, 33(3): 514-521.
2Preux P, Delepoulle S, Darcheville J C. A generic architecture for adaptive agents based on reinforcement learning. Information Sciences, 2004, 161 (1/2): 37-55.
3Barto A G, Sutton R S, Anderson C W. Neuronlikc adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 1983, 13(5): 834-846.
4Sutton R, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, 1999, 112(2): 181-211.
5Dietterich T. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 2000, 13(1): 227-303.
6Andre D, Russell S J. Programmable reinforcement learning agents.Advances in Neural Information Processing Systems 13. Cambridge, MA, USA: MIT Press, 2001:1019 1025.
7Sutton R S. Generalization in reinforcement learning: successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8. Cambridge, MA, USA: MIT Press, 1996:1038-1044.
8Albus J S. A new approach to manipulator control: the cerebellar model articulation controller (CMAC). Transactions of the ASME, Series G: Journal of Dynamic Systems, Measurement and Control, 1975, 97(3): 220-227.
9Rummery G A. Problem solving with reinforcement learning. Ph.D. Thesis. Cambridge, UK: Cambridge University, 1995.
10Ormoneit D, Sen S. Kernel-based reinforcement learning. Machine Learning, 2004, 49(2/3): 161-178.

1M. B. Foreman,R. B. Holmes,W. E. Taylor.On the Solutions of xn＋1=f（xn）/xn-1Where fis Piecewise Linear[J].Journal of Mathematics and System Science,2016,6(9):348-351.
2陈建勤,吕剑虹,陈来九.模糊控制系统的现状与发展(上)[J].广州自动化,1995(2):7-13.
3马震安.借助Excel 练习试卷轻松做[J].电脑爱好者,2012(15):68-69.
4陈倩.嵌入式FTP服务器的设计与实现[J].电子元器件应用,2010,12(3):36-38. 被引量：5
5陈洁.“反比例函数”教学设计[J].数理化学习,2014(7):2-2.
6张茂元,卢正鼎.基于特征选取及模糊学习的网页分类方法研究[J].小型微型计算机系统,2004,25(7):1397-1400. 被引量：4
7张琴.有关复正定矩阵的几个行列式不等式[J].吉林建筑工程学院学报,1994,11(3):11-16.
8王强,曹仲冬,马振晖.一种新型RBF神经网络及其在舰船雷达目标识别中的应用[J].现代电子技术,2003,26(6):95-98. 被引量：3
9王利文,刘琼荪.直推式支持向量机的研究学习[J].重庆工商大学学报（自然科学版）,2014,31(5):58-64. 被引量：1
10黄善邦.环境监测中数据流处理技术的应用分析[J].水能经济,2016,0(1):305-305.

The Journal of China Universities of Posts and Telecommunications

2010年第4期

浏览历史

内容加载中请稍等...

Fuzzy Q-learning in continuous state and action space

参考文献27

相关作者

相关机构

相关主题

浏览历史