In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic no...In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming(ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.展开更多
The electrical power generation from low temperature heat source attracts more and more attentions but the temperature mismatching between the heat sources and working medium in the organic Rankine cycle(ORC)becomes a...The electrical power generation from low temperature heat source attracts more and more attentions but the temperature mismatching between the heat sources and working medium in the organic Rankine cycle(ORC)becomes an issue.The organic flash cycle(OFC)is an effective solution to this issue.In this paper,the OFC is analyzed by the concept of entransy loss and the T-Q(temperature-heat flow rate)diagram for the heat-work conversion.The equations for cycles of the basic OFC and the OFC whose heat source is the exhaust gas of the turbine in a Brayton cycle(the combined cycle)are derived theoretically and the results indicate that larger entransy loss rate leads to larger output power with prescribed inlet parameters of the hot stream in the discussed cases,which is displayed by the T-Qdiagram intuitively.Two numerical examples demonstrate that the optimal mass flow rate of the working medium for the maximum entransy loss rate is the same as that for the maximum output power.The T-Qdiagram analyses is in accordance with the numerical results.The concept of entransy loss can be used as the criteria for the OFC optimization.展开更多
基金supported in part by National Natural Science Foundation of China(Grant Nos.6137410561233001+1 种基金61273140)in part by Beijing Natural Science Foundation(Grant No.4132078)
文摘In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming(ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
基金The National Natural Science Foundation of China(Grant No.51376101)Science Fund for Creative Research Groups(Grant No.51321002)Tsinghua University Initiative Scientific Research Program
文摘The electrical power generation from low temperature heat source attracts more and more attentions but the temperature mismatching between the heat sources and working medium in the organic Rankine cycle(ORC)becomes an issue.The organic flash cycle(OFC)is an effective solution to this issue.In this paper,the OFC is analyzed by the concept of entransy loss and the T-Q(temperature-heat flow rate)diagram for the heat-work conversion.The equations for cycles of the basic OFC and the OFC whose heat source is the exhaust gas of the turbine in a Brayton cycle(the combined cycle)are derived theoretically and the results indicate that larger entransy loss rate leads to larger output power with prescribed inlet parameters of the hot stream in the discussed cases,which is displayed by the T-Qdiagram intuitively.Two numerical examples demonstrate that the optimal mass flow rate of the working medium for the maximum entransy loss rate is the same as that for the maximum output power.The T-Qdiagram analyses is in accordance with the numerical results.The concept of entransy loss can be used as the criteria for the OFC optimization.