摘要
Dear Editor,In this letter,the multi-objective optimal control problem of nonlinear discrete-time systems is investigated.A data-driven policy gradient algorithm is proposed in which the action-state value function is used to evaluate the policy.In the policy improvement process,the policy gradient based method is employed.
基金
the National Natural Science Foundation of China(61922063,62273255,62150026)
in part by the Shanghai International Science and Technology Cooperation Project(21550760900,22510712000)
the Shanghai Municipal Science and Technology Major Project(2021SHZDZX0100)
the Fundamental Research Funds for the Central Universities。