This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compac...This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compact metric action sets,we present a set of sufficient conditions to ensurethe existence of the stationary optimal policies.展开更多
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar...This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.展开更多
This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a tran...This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a transfer-based approximate dynamic programming(TADP)algorithm proposed in this paper.Different from traditional approximate dynamic programming(ADP)algorithms,TADP can obtain the commitment states of most units in advance through a decision transfer technique,thus reducing the action space of TADP significantly.Moreover,compared with traditional ADP algorithms,which require to determine the commitment state of each unit,TADP only needs determine the unit with the smallest on-state probability among all on-state units,thus further reducing the action space.The proposed algorithm can also prevent the iter-ative update of value functions and the reliance on rolling forecast information,which makes more sense in the rolling decision-making process of RSCUC.Finally,nu-merical simulations are carried out on a modified IEEE 39-bus system and a real 2778-bus system to demonstrate the effectiveness of the proposed algorithm.展开更多
基金This paper was prepared with the support of the National Youth Science Foundation
文摘This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compact metric action sets,we present a set of sufficient conditions to ensurethe existence of the stationary optimal policies.
基金supported by the National Natural Science Foundation of China under Grant Nos.61374080 and 61374067the Natural Science Foundation of Zhejiang Province under Grant No.LY12F03010+1 种基金the Natural Science Foundation of Ningbo under Grant No.2012A610032Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions
文摘This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.
基金supported in part by the State Key Laboratory of HVDC(No.SKLHVDC-2021-KF-09)in part by the National Natural Science Foundation of China(No.51977081).
文摘This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a transfer-based approximate dynamic programming(TADP)algorithm proposed in this paper.Different from traditional approximate dynamic programming(ADP)algorithms,TADP can obtain the commitment states of most units in advance through a decision transfer technique,thus reducing the action space of TADP significantly.Moreover,compared with traditional ADP algorithms,which require to determine the commitment state of each unit,TADP only needs determine the unit with the smallest on-state probability among all on-state units,thus further reducing the action space.The proposed algorithm can also prevent the iter-ative update of value functions and the reliance on rolling forecast information,which makes more sense in the rolling decision-making process of RSCUC.Finally,nu-merical simulations are carried out on a modified IEEE 39-bus system and a real 2778-bus system to demonstrate the effectiveness of the proposed algorithm.