A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource...A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.展开更多
In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of app...In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of approximate dynamic programming (ADP), DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm.展开更多
With more and more offshore wind power being increasingly connected to power grids,fluctuations in offshore wind speeds result in risks of high operation costs.To mitigate this problem,a risk-averse stochastic economi...With more and more offshore wind power being increasingly connected to power grids,fluctuations in offshore wind speeds result in risks of high operation costs.To mitigate this problem,a risk-averse stochastic economic dispatch(ED)model of power system with multiple offshore wind farms(OWFs)is proposed in this paper.In this model,a novel GlueVaR method is used to measure the tail risk of the probability distribution of operation cost.The weighted sum of the expected operation cost and the GlueVaR is used to reflect the risk of operation cost,which can consider different risk requirements including risk aversion and risk neutrality flexibly by adjusting parameters.Then,a risk-averse approximate dynamic programming(ADP)algorithm is designed for solving the proposed model,in which multi-period ED problem is decoupled into a series of single-period ED problems.Besides,GlueVaR is introduced into the approximate value function training process for risk aversion.Finally,a distributed and risk-averse ADP algorithm is constructed based on the alternating direction method of multipliers,which can further decouple single-period ED between transmission system and multiple OWFs for ensuring information privacy.Case studies on the modified IEEE 39-bus system with an OWF and an actual provincial power system with four OWFs demonstrate correctness and efficiency of the proposed model and algorithm.展开更多
The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of t...The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.展开更多
文摘A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.
基金supported in part by National Natural Science Foundation of China(61533017,61273140,61304079,61374105,61379099,61233001)Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3)the Open Research Project from SKLMCCS(20150104)
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Nat- ural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), and the Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
基金supported by the National Natural Science Foundation of China under Cooperative Research Funds(No.50828701)the third author is also supported by the U.S.Natural Science Foundation(No.ECCS-0702057)
文摘In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of approximate dynamic programming (ADP), DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm.
基金supported by the Key Research and Development Project of Guangdong Province(2021B0101230004)the National Natural Science Foundation of China(51977080).
文摘With more and more offshore wind power being increasingly connected to power grids,fluctuations in offshore wind speeds result in risks of high operation costs.To mitigate this problem,a risk-averse stochastic economic dispatch(ED)model of power system with multiple offshore wind farms(OWFs)is proposed in this paper.In this model,a novel GlueVaR method is used to measure the tail risk of the probability distribution of operation cost.The weighted sum of the expected operation cost and the GlueVaR is used to reflect the risk of operation cost,which can consider different risk requirements including risk aversion and risk neutrality flexibly by adjusting parameters.Then,a risk-averse approximate dynamic programming(ADP)algorithm is designed for solving the proposed model,in which multi-period ED problem is decoupled into a series of single-period ED problems.Besides,GlueVaR is introduced into the approximate value function training process for risk aversion.Finally,a distributed and risk-averse ADP algorithm is constructed based on the alternating direction method of multipliers,which can further decouple single-period ED between transmission system and multiple OWFs for ensuring information privacy.Case studies on the modified IEEE 39-bus system with an OWF and an actual provincial power system with four OWFs demonstrate correctness and efficiency of the proposed model and algorithm.
基金This work was supported in part by Beijing Natural Science Foundation(JQ19013)the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(61773373).
文摘The core task of tracking control is to make the controlled plant track a desired trajectory.The traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps increases.In this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control problem.Unlike the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to zero.The discounted iterative scheme under the new cost function for the special case of linear systems is elaborated.Finally,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.