Learning the optimal state-feedback via supervised imitation learning

导出

摘要 Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads to the design of control policies closely approximating the optimal state-feedback.This approach requires training a machine learning algorithm(in our case deep neural networks)directly on state-control pairs originating from optimal trajectories.We have shown in previous work that,when restricted to low-dimensional state and control spaces,this approach is very successful in several deterministic,non-linear problems in continuous-time.In this work,we refine our previous studies using as a test case a simple quadcopter model with quadratic and time-optimal objective functions.We describe in detail the best learning pipeline we have developed,that is able to approximate via deep neural networks the state-feedback map to a very high accuracy.We introduce the use of the softplus activation function in the hidden units of neural networks showing that it results in a smoother control profile whilst retaining the benefits of rectifiers.We show how to evaluate the optimality of the trained state-feedback,and find that already with two layers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-time taken to converge to the policy’s fixed point.With respect to these metrics,we show that improvements in the mean absolute error do not necessarily correspond to better policies.

作者 Dharmesh Tailor Dario Izzo

机构地区 Advanced Concepts Team

出处《Astrodynamics》 CSCD 2019年第4期361-374,共14页 航天动力学（英文）

关键词 optimal control deep learning imitation learning G&CNET

分类号 O17 [理学—基础数学]

引文网络
相关文献

1张越.刊首语[J].个人电脑,2007,13(9):2-2.
2Mohammed Anwer Jassim.Novel Approach of Adjustment Theory Based on ANN Backpropagation[J].Journal of Civil Engineering and Architecture,2021,15(5):277-283.
3QIN Yizhao,YAO Peng-Fei.The Time-Dependent Von Kármán Shell Equation as a Limit of Three-Dimensional Nonlinear Elasticity[J].Journal of Systems Science & Complexity,2021,34(2):465-482.
4Defeng HE,Long ZHOU,Zhe SUN.Energy-efficient receding horizon trajectory planning of high-speed trains using real-time traffic information[J].Control Theory and Technology,2020,18(2):204-216. 被引量：3
5Xiao-Ming Huang,Li-Zhao Liu,Si Zhou,Ji-Jun Zhao.Physical properties and device applications of graphene oxide[J].Frontiers of physics,2020,15(3):27-96. 被引量：3
6Adebisi Ade Ogunde,Isaac Oluwaseyi Ajao,Gbenga Adelekan Olalude.On the Application of Nadarajah Haghighi Gompertz Distribution as a Life Time Distribution[J].Open Journal of Statistics,2020,10(5):850-862.
7CAI Bo,WENG Rui,ZHANG RuiXian,LIANG Ye,ZHANG LiXian.Stabilization of a class of fuzzy stochastic jump systems with partial information on jump and sojourn parameters[J].Science China(Technological Sciences),2021,64(2):353-363.
8Nikolay AELISOV,Sergey AISHKOV,Igor ALOMAKA,Valentin GSHAKHOV.Influence of non-equilibrium reactions on the optimization of aerothrust aeroassisted maneuver with orbital change[J].Chinese Journal of Aeronautics,2020,33(8):2133-2145.
9万大地,段祥瑞,范鑫超,袁野,黄腾,潘迪康,刘静艳,李西成.后稳定型与后交叉韧带保留型膝关节假体在全膝关节置换中的疗效:系统综述和Meta分析[J].中国组织工程研究,2021,25(36):5897-5904. 被引量：10
10Yue Wang,Zhuang-Zhuang Ma,Ying Li,Fei Zhang,Xu Chen,Zhi-Feng Shi.Low-dimensional phases engineering for improving the emission efficiency and stability of quasi-2D perovskite films[J].Chinese Physics B,2021,30(6):594-601.

Astrodynamics

2019年第4期

浏览历史

内容加载中请稍等...

Learning the optimal state-feedback via supervised imitation learning

相关作者

相关机构

相关主题

浏览历史