Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control 被引量：1

下载PDF

导出

摘要 This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.

作者 Ding Wang Jiangyu Wang Mingming Zhao Peng Xin Junfei Qiao

机构地区 IEEE Faculty of Information Technology

出处《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第9期1797-1809,共13页 自动化学报（英文版）

基金 the National Key Research and Development Program of China(2021ZD0112302) the National Natural Science Foundation of China(62222301,61890930-5,62021003) the Beijing Natural Science Foundation(JQ19013).

关键词 Adaptive critic artificial neural networks Hamilton-Jacobi-Bellman(HJB)equation multi-step heuristic dynamic programming multi-step reinforcement learning optimal control

分类号 O232 [理学—运筹学与控制论] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献9

1Teng Liu,Bin Tian,Yunfeng Ai,Li Li,Dongpu Cao,Fei-Yue Wang.Parallel Reinforcement Learning:A Framework and Case Study[J].IEEE/CAA Journal of Automatica Sinica,2018,5(4):827-835. 被引量：9
2王鼎.一类离散动态系统基于事件的迭代神经控制[J].工程科学学报,2022,44(3):411-419. 被引量：4
3王鼎,赵明明,哈明鸣,乔俊飞.基于折扣广义值迭代的智能最优跟踪及应用验证[J].自动化学报,2022,48(1):182-193. 被引量：6
4Mingming Ha,Ding Wang,Derong Liu.Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control[J].IEEE/CAA Journal of Automatica Sinica,2022,9(7):1262-1272. 被引量：6
5王鼎,胡凌治,赵明明,哈明鸣,乔俊飞.未知非线性零和博弈最优跟踪的事件触发控制设计[J].自动化学报,2023,49(1):91-101. 被引量：1
6Qinghai Miao,Yisheng Lv,Min Huang,Xiao Wang,Fei-Yue Wang.Parallel Learning:Overview and Perspective for Computational Learning Across Syn2Real and Sim2Real[J].IEEE/CAA Journal of Automatica Sinica,2023,10(3):603-631. 被引量：12
7Tianyu Wu,Shizhu He,Jingping Liu,Siqi Sun,Kang Liu,Qing-Long Han,Yang Tang.A Brief Overview of ChatGPT:The History,Status Quo and Potential Future Development[J].IEEE/CAA Journal of Automatica Sinica,2023,10(5):1122-1136. 被引量：43
8Ding Wang,Ning Gao,Derong Liu,Jinna Li,Frank L.Lewis.Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications[J].IEEE/CAA Journal of Automatica Sinica,2024,11(1):18-36. 被引量：2
9王鼎,赵慧玲,李鑫.基于多目标粒子群优化的污水处理系统自适应评判控制[J].工程科学学报,2024,46(5):908-917. 被引量：1

引证文献1

1王鼎,王将宇,乔俊飞.融合自适应评判的随机系统数据驱动策略优化[J].自动化学报,2024,50(5):980-990.

1Lidong Wang,Reed L.Mosher,Terril C.Falls,Patti Duett.Data Analytics of an Information System Based on a Markov Decision Process and a Partially Observable Markov Decision Process[J].Journal of Computer Science Research,2023,5(1):21-30.
2CHENG Guangran,DONG Lu,YUAN Xin,SUN Changyin.Reinforcement learning-based scheduling of multi-battery energy storage system[J].Journal of Systems Engineering and Electronics,2023,34(1):117-128. 被引量：1
3Jingrui SUN,Hanxiao WANG,Jiongmin YONG.Erratum to:Turnpike Properties for Stochastic Linear-Quadratic Optimal Control Problems[J].Chinese Annals of Mathematics,Series B,2023,44(1):163-163.
4朱佳龙,周晓华,宗琳.基于ADHDP的插电式混合动力汽车能量管理策略[J].广西科技大学学报,2023,34(3):99-107. 被引量：1
5Qi ZHANG,Zongwu XIE,Baoshi CAO,Yang LIU.A policy iteration method for improving robot assembly trajectory efficiency[J].Chinese Journal of Aeronautics,2023,36(3):436-448.
6李千妍,王伟.常弹性方差模型下含资本利得税的最优投资策略[J].宁波大学学报（理工版）,2023,36(4):104-111.
7TANG Xiaonan,ZHU Xumei,QIAN Jiayan,SUN Hong.The Impact of Interaction Methods on Online Learning for English Majors[J].US-China Education Review(B),2023,13(1):31-36.
8Derong Liu,Mingming Ha,Shan Xue.State of the Art of Adaptive Dynamic Programming and Reinforcement Learning[J].CAAI Artificial Intelligence Research,2022,1(2):93-110.
9Kai-Hua Zhang,Ying Jiang,Liang-Shun Zhang.Inferring the Physics of Structural Evolution of Multicomponent Polymers via Machine-Learning-Accelerated Method[J].Chinese Journal of Polymer Science,2023,41(9):1377-1385.
10崔璨,王伟.指数保费准则下存在模糊厌恶的最优分红策略[J].天津师范大学学报（自然科学版）,2023,43(3):8-11.

IEEE/CAA Journal of Automatica Sinica

2023年第9期

浏览历史

内容加载中请稍等...

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control 被引量：1

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史