期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A Lyapunov characterization of robust policy optimization
1
作者 Leilei Cui Zhong-Ping Jiang 《Control Theory and Technology》 EI CSCD 2023年第3期374-389,共16页
In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each... In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method. 展开更多
关键词 policy optimization policy iteration(PI)-Input-to-state stability(ISS) Lyapunov's direct method
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部