Approximate Dynamic Programming for Self-Learning Control 被引量：14

Approximate Dynamic Programming for Self-Learning Control

下载PDF

导出

摘要 This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning. This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning.

作者 DerongLiu

机构地区 DepartmentofElectricalandComputerEngineering

出处《自动化学报》 EI CSCD 北大核心 2005年第1期13-18,共6页 Acta Automatica Sinica

基金 Supported by the National Science Foundation (U.S.A.) under Grant ECS-0355364

关键词近似动态程序自学习控制神经网络人工智能 Approximate dynamic programming, learning control, neural networks

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献36

1Bellman R E. Dynamic Programming, Princeton, N J: Princeton University Press- 1957.
2Dreyfus S E, Law A M. The Art and Theory of Dynamic Programming, New York, NY: Academic Press,1977.
3Lewis F L, Syrmos V L. Optimal Control, New York, NY: John Wiley, 1995.
4Balakrishnan S N, Biega V. Adaptive-critic-based neural networks for aircraft optimal control, Journal of Guidance, Control, Dynamics, 1996, 19(7-8): 893--898.
5Prokhorov D V, Wunsch D C. Adaptive critic designs, IEEE Transactions on Neural Networks, 1997, 8(9):997--1007.
6Si J, Wang Y-T. On-line learning control by association and reinforcement, IEEE Transactions on Neural Networks, 2001, 12(3): 264-276.
7Werbos P J. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research, IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-17, 1987,7-20.
8Werbos P J. A menu of designs for reinforcement learning over time, In: Neural Networks for Control (Chapter3), Edited by W. T. Miller, R. S. Sutton, and P. J. Werbos, Cambridge, MA: The MIT Press, 1990.
9Werbos P J. Approximate dynmic programming for real-time control and neural modeling, In: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches (Chapter 13), Edited by D. A. White and D.A. Sofge, New York, NY: Van Nostrand Reinhold, 1992.
10Werbos P J. Aduanced forecasting methods for global crisis warning and models of intelligence, General Systems Yearbook, 1977, 22:25-38.

同被引文献90

1陈琪锋,吴文昭,戴金海.基于多Agent协商的分布式卫星自主构形保持规划研究[J].宇航学报,2008,29(2):517-521. 被引量：1
2王飞跃.人工社会、计算实验、平行系统——关于复杂社会经济系统计算研究的讨论[J].复杂系统与复杂性科学,2004,1(4):25-35. 被引量：234
3文锋,陈宗海,望安全.对快速AHC方法的一种改进[J].信息与控制,2003,32(z1):652-656. 被引量：2
4孙明轩,王郸维,陈彭年.有限区间非线性系统的重复学习控制[J].中国科学：信息科学,2010,40(3):433-444. 被引量：12
5王飞跃,汤淑明.人工交通系统的基本思想与框架体系[J].复杂系统与复杂性科学,2004,1(2):52-59. 被引量：40
6王飞跃,戴汝为,张嗣瀛,陈国良,汤淑明,杨东援,杨晓光,李平.关于城市交通、物流、生态综合发展的复杂系统研究方法[J].复杂系统与复杂性科学,2004,1(2):60-69. 被引量：31
7王飞跃.平行系统方法与复杂系统的管理和控制[J].控制与决策,2004,19(5):485-489. 被引量：332
8王飞跃.计算实验方法与复杂系统行为分析和决策评估[J].系统仿真学报,2004,16(5):893-897. 被引量：147
9张雁冰,杭大明,马正新,曹志刚.基于再励学习的主动队列管理算法[J].软件学报,2004,15(7):1090-1098. 被引量：7
10王飞跃.关于复杂系统研究的计算理论与方法[J].中国基础科学,2004,6(5):3-10. 被引量：97

引证文献14

1陈宗海,文锋.基于复杂过程简化模型的DHP学习控制[J].控制与决策,2006,21(10):1087-1091. 被引量：2
2Yanhong Luo Huaguang Zhang.Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators[J].Progress in Natural Science:Materials International,2008,18(8):1023-1029. 被引量：2
3赵冬斌,刘德荣,易建强.基于自适应动态规划的城市交通信号优化控制方法综述[J].自动化学报,2009,35(6):676-681. 被引量：39
4WEI Qing-Lai,ZHANG Hua-Guang,CUI Li-Li.Data-based Optimal Control for Discrete-time Zero-sum Games of 2-D Systems Using Adaptive Critic Designs[J].自动化学报,2009,35(6):682-692. 被引量：8
5罗艳红,张化光,曹宁,陈兵.一类控制受约束非线性系统的基于单网络贪婪迭代DHP算法的近似最优镇定[J].自动化学报,2009,35(11):1436-1445. 被引量：1
6WEI Qing-Lai,ZHANG Hua-Guang,LIU De-Rong,ZHAO Yan.An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming[J].自动化学报,2010,36(1):121-129. 被引量：17
7康琦,汪镭,安静,吴启迪.基于近似动态规划的微粒群系统参数优化研究[J].自动化学报,2010,36(8):1171-1181. 被引量：4
8徐昕,沈栋,高岩青,王凯.基于马氏决策过程模型的动态系统学习控制:研究前沿与展望[J].自动化学报,2012,38(5):673-687. 被引量：21
9齐驰,侯忠生,贾琰.基于排队长度均衡的交叉口信号配时优化策略[J].控制与决策,2012,27(8):1191-1194. 被引量：15
10王飞跃,刘德荣,熊刚,程长建,赵冬斌.复杂系统的平行控制理论及应用[J].复杂系统与复杂性科学,2012,9(3):1-12. 被引量：49

二级引证文献150

1吕宜生,陈圆圆,金峻臣,李镇江,叶佩军,朱凤华.平行交通:虚实互动的智能交通管理与控制[J].智能科学与技术学报,2019,1(1):21-33. 被引量：29
2刘建军,王磊,刘希未,马龙江.生产车间物流平行系统体系研究[J].兰州大学学报（自然科学版）,2018,54(5):698-704. 被引量：3
3杜海峰,宋遥,何晓晨.乡村振兴背景下的村民征地集群行为研究[J].信息技术与管理应用,2023(3):14-33.
4刘久明,李华,邓培杰,徐建闽,卢凯.左转错位交叉口信号配时策略的MILP模型[J].交通信息与安全,2013,31(6):81-84. 被引量：1
5陈宗海,杨志华,王海波,盛捷.从知识的表达和运用综述强化学习研究[J].控制与决策,2008,23(9):961-968. 被引量：14
6刘晓.关于城市交通拥堵问题研究的文献综述[J].经济研究导刊,2010(4):102-103. 被引量：27
7程玉虎,冯涣婷,王雪松.基于状态-动作图测地高斯基的策略迭代强化学习[J].自动化学报,2011,37(1):44-51. 被引量：6
8周晓华,宋春宁,王荔芳,黄玲.基于ADHDP方法的HVDC整流控制器设计[J].组合机床与自动化加工技术,2011(6):57-60. 被引量：1
9林小峰,杨晓娜,黄清宝,宋春宁.基于ADP的一类时滞离散系统跟踪控制[J].广西大学学报（自然科学版）,2011,36(6):994-999. 被引量：1
10程玉虎,冯涣婷,王雪松.基于参数探索的期望最大化策略搜索[J].自动化学报,2012,38(1):38-45. 被引量：4

1Call for papers Journal of Control Theory and Applications Special issue on Approximate dynamic programming and reinforcement learning[J].控制理论与应用（英文版）,2010,8(2):257-257.
2Travis DIERKS,Sarangapani JAGANNATHAN.Online optimal control of nonlinear discrete-time systems using approximate dynamic programming[J].控制理论与应用（英文版）,2011,9(3):361-369. 被引量：4
3WEI Qing-Lai,ZHANG Hua-Guang,LIU De-Rong,ZHAO Yan.An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming[J].自动化学报,2010,36(1):121-129. 被引量：17
4计算机系统、计算机网络与网络互连[J].电子科技文摘,2003,0(4):107-110.
5宋睿卓,肖文栋,魏庆来.A new approach of optimal control for a class of continuous-time chaotic systems by an online ADP algorithm[J].Chinese Physics B,2014,23(5):138-144.
6Editorial[J].控制理论与应用（英文版）,2011,9(3):309-309. 被引量：1
7S.N.BALAKRISHNAN.Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems[J].控制理论与应用（英文版）,2011,9(3):370-380. 被引量：2
8Warren B.POWELL.A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications[J].控制理论与应用（英文版）,2011,9(3):336-352. 被引量：2
9金彦亮,蒋轶凡,陈惠民,刘海涛.Design of maximizing homogeneous and heterogeneous clustered sensor network lifetime[J].Journal of Harbin Institute of Technology(New Series),2009,16(6):789-798. 被引量：1
10LIAO Yong CHEN Xudong XIONG Guangze ZHU Qingxin, SANG Nan LI Yun.Adaptive CPU Resource Allocation for Pervasive Computing Devices Based on Optimal Control[J].Chinese Journal of Electronics,2006,15(3):431-436. 被引量：1

自动化学报

2005年第1期

浏览历史

内容加载中请稍等...

Approximate Dynamic Programming for Self-Learning Control 被引量：14

参考文献36

同被引文献90

引证文献14

二级引证文献150

相关作者

相关机构

相关主题

浏览历史