摘要
The optimal control problem with a long run average cost is investigated for unknown linear discrete-time systems with additive noise.The authors propose a value iteration-based stochastic adaptive dynamic programming(VI-based SADP)algorithm,based on which the optimal controller is obtained.Different from the existing relevant work,the algorithm does not need to estimate the expectation(conditional expectation)and variance(conditional variance)of states or other relevant variables,and the convergence of the algorithm can be proved rigorously.A simulation example is given to verify the effectiveness of the proposed approach.
基金
supported by the National Natural Science Foundation of China under Grant No.61673284
the Science Development Project of Sichuan University under Grant No.2020SCUNL201。