In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each...In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.展开更多
As shown in the results of the survey of the confidence of 100 Chinese economists,conducted by China’s Economic Monitoring and Analysis Centre,National Bureau of Statistics,the confidence index of the economists in t...As shown in the results of the survey of the confidence of 100 Chinese economists,conducted by China’s Economic Monitoring and Analysis Centre,National Bureau of Statistics,the confidence index of the economists in the second quarter of 2009 was 5.60,the range of possible indices being from 1 to 9.It was 1.63 points higher than that of the first quarter,continuing the stabilized situation with a trend of rising again.The situation had appeared since the beginning of 2009.展开更多
基金supported in part by the National Science Foundation(Nos.ECCS-2210320,CNS-2148304).
文摘In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.
文摘As shown in the results of the survey of the confidence of 100 Chinese economists,conducted by China’s Economic Monitoring and Analysis Centre,National Bureau of Statistics,the confidence index of the economists in the second quarter of 2009 was 5.60,the range of possible indices being from 1 to 9.It was 1.63 points higher than that of the first quarter,continuing the stabilized situation with a trend of rising again.The situation had appeared since the beginning of 2009.