Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

下载PDF

导出

摘要 This paper estimates an off-policy integral reinforcement learning（IRL） algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman（HJB） equation, an off-policy IRL algorithm is proposed.It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method. This paper estimates an off-policy integral reinforcement learning（IRL） algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman（HJB） equation, an off-policy IRL algorithm is proposed.It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method.

作者魏庆来宋睿卓孙秋野肖文栋

机构地区 The State Key Laboratory of Management and Control for Complex Systems School of Automation and Electrical Engineering School of Information Science and Engineering

出处《Chinese Physics B》 SCIE EI CAS CSCD 2015年第9期147-152,共6页 中国物理B（英文版）

基金 Project supported by the National Natural Science Foundation of China(Grant Nos.61304079 and 61374105) the Beijing Natural Science Foundation,China(Grant Nos.4132078 and 4143065) the China Postdoctoral Science Foundation(Grant No.2013M530527) the Fundamental Research Funds for the Central Universities,China(Grant No.FRF-TP-14-119A2) the Open Research Project from State Key Laboratory of Management and Control for Complex Systems,China(Grant No.20150104)

关键词 adaptive dynamic programming approximate dynamic programming chaotic system optimal tracking control adaptive dynamic programming,approximate dynamic programming,chaotic system,optimal tracking control

分类号 O415.5 [理学—理论物理] O231 [理学—运筹学与控制论]

引文网络
相关文献

参考文献32

1Lü J and Lu J 2003 Chaos Soliton. Fract. 17 127.
2Xu C and Wu Y 2015 Appl. Math. Model. 39 2295.
3Ma T, Zhang H and Fu J 2008 Chin. Phys. B 17 4407.
4Ma T and Fu J 2011 Chin. Phys. B 20 050511.
5Yang D 2014 Chin. Phys. B 23 010504.
6Song R, Xiao W, Sun C and Wei Q 2013 Chin. Phys. B 22 090502.
7Song R, Xiao W and Wei Q 2014 Chin. Phys. B 23 050504.
8Gao S, Dong H, Sun X and Ning B 2015 Chin. Phys. B 24 010501.
9Wei Q and Liu D 2014 IEEE Trans. Autom. Sci. Eng. 11 1020.
10Wei Q and Liu D 2015 Neurocomputing 149 106.

1林小峰,曹怒云,宋绍剑.基于ε-ADP的一类离散非线性系统最优跟踪控制[J].广西大学学报（自然科学版）,2014,39(2):372-377.
2Girish Chowdhary,Miao Liu,Robert Grande,Thomas Walsh,Jonathan How,Lawrence Carin.Off-Policy Reinforcement Learning with Gaussian Processes[J].IEEE/CAA Journal of Automatica Sinica,2014,1(3):227-238. 被引量：2
3Tang Ruichun,Ma Huamin,Guo Shuangle,Ren Lijie.Optimal tracking control for linear time-delay large-scale systems with persistent disturbances[J].Journal of Systems Engineering and Electronics,2009,20(5):1058-1064. 被引量：1
4Khac Duc Do.Global Inverse Optimal Tracking Control of Underactuated Omni-directional Intelligent Navigators （ODINs）[J].Journal of Marine Science and Application,2015,14(1):1-13. 被引量：2
5魏庆来,刘德荣,徐延才.Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach[J].Chinese Physics B,2015,24(3):87-94. 被引量：1
6SONG Rui-Zhuo XIAO Wen-Dong SUN Chang-Yin.Optimal Tracking Control for a Class of Unknown Discrete-time Systems with Actuator Saturation via Data-based ADP Algorithm[J].自动化学报,2013,39(9):1413-1420. 被引量：4
7宋睿卓,肖文栋,孙长银,魏庆来.Approximation-error-ADP-based optimal tracking control for chaotic systems with convergence proof[J].Chinese Physics B,2013,22(9):305-311.
8Jing Na,Guido Herrmann.Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems[J].IEEE/CAA Journal of Automatica Sinica,2014,1(4):412-422. 被引量：13
9郭超,梁晓庚,王斐.基于ADP的高超声速飞行器非线性最优控制[J].火力与指挥控制,2014,39(6):77-81. 被引量：3
10宋睿卓,肖文栋,魏庆来.A new approach of optimal control for a class of continuous-time chaotic systems by an online ADP algorithm[J].Chinese Physics B,2014,23(5):138-144.

Chinese Physics B

2015年第9期

浏览历史

内容加载中请稍等...

Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems

参考文献32

相关作者

相关机构

相关主题

浏览历史