In this paper we discuss the discrete, time non--homogeneous discounted Markovian decisionprogramming, where the state space and all action sets are countable. Suppose that the optimumvalue function is finite. We give...In this paper we discuss the discrete, time non--homogeneous discounted Markovian decisionprogramming, where the state space and all action sets are countable. Suppose that the optimumvalue function is finite. We give the necessary and sufficient conditions for the existence of anoptimal policy. Suppose that the absolute mean of rewards is relatively bounded. We also give thenecessary and sufficient conditions for the existence of an optimal policy.展开更多
In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (...In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case.展开更多
This paper studies price-based residential demand response management(PB-RDRM)in smart grids,in which non-dispatchable and dispatchable loads(including general loads and plug-in electric vehicles(PEVs))are both involv...This paper studies price-based residential demand response management(PB-RDRM)in smart grids,in which non-dispatchable and dispatchable loads(including general loads and plug-in electric vehicles(PEVs))are both involved.The PB-RDRM is composed of a bi-level optimization problem,in which the upper-level dynamic retail pricing problem aims to maximize the profit of a utility company(UC)by selecting optimal retail prices(RPs),while the lower-level demand response(DR)problem expects to minimize the comprehensive cost of loads by coordinating their energy consumption behavior.The challenges here are mainly two-fold:1)the uncertainty of energy consumption and RPs;2)the flexible PEVs’temporally coupled constraints,which make it impossible to directly develop a model-based optimization algorithm to solve the PB-RDRM.To address these challenges,we first model the dynamic retail pricing problem as a Markovian decision process(MDP),and then employ a model-free reinforcement learning(RL)algorithm to learn the optimal dynamic RPs of UC according to the loads’responses.Our proposed RL-based DR algorithm is benchmarked against two model-based optimization approaches(i.e.,distributed dual decomposition-based(DDB)method and distributed primal-dual interior(PDI)-based method),which require exact load and electricity price models.The comparison results show that,compared with the benchmark solutions,our proposed algorithm can not only adaptively decide the RPs through on-line learning processes,but also achieve larger social welfare within an unknown electricity market environment.展开更多
The present research is based upon a comprehensive survey which discusses the slightly tolerable water level of Balaton between 2000 and 2003. The low water level of the extreme period caused considerable problems in ...The present research is based upon a comprehensive survey which discusses the slightly tolerable water level of Balaton between 2000 and 2003. The low water level of the extreme period caused considerable problems in recreation. Our goal was to investigate the possible water transfer policies and the water level regulation policy of Lake Balaton by applying the dynamic programming of Markov chains. This iteration supports the cost-benefit analysis of different scenarios and also provides information about the best water governing policy. As a basis of our scientific analysis, Markov chains were created by ARMA (autoregressive moving average) synthetic data generator. Profit was joined to each transition-probability for the economic analysis. In our case the profit was negative, because the harmful effects of the low water level should be estimated, which is based on the calculated willingness-to-pay for improving the water quality of Lake Balaton. In addition, the profit includes the cost of different water supplement scenarios. After computer programming, the method proved to be an efficient tool to buttress the cost-benefit analysis of water supplement scenarios. The result highlights the importance of further climate change monitoring. Calculation confirmed water transfer to be cost-effective, yet scenarios with less ecological risk are also effective, thus preferable.展开更多
Purpose–The purpose of this paper is to study a multiple-origin-multiple-destination variant of dynamic critical nodes detection problem(DCNDP)and dynamic critical links detection problem(DCLDP)in stochastic networks...Purpose–The purpose of this paper is to study a multiple-origin-multiple-destination variant of dynamic critical nodes detection problem(DCNDP)and dynamic critical links detection problem(DCLDP)in stochastic networks.DCNDP and DCLDP consist of identifying the subset of nodes and links,respectively,whose deletion maximizes the stochastic shortest paths between all origins–destinations pairs,in the graph modeling the transport network.The identification of such nodes(or links)helps to better control the road traffic and predict the necessary measures to avoid congestion.Design/methodology/approach–A Markovian decision process is used to model the shortest path problem underdynamic trafficconditions.Effectivealgorithmstodeterminethe criticalnodes(links)whileconsideringthe dynamicity of the traffic network are provided.Also,sensitivity analysis toward capacity reduction for critical links is studied.Moreover,the complexity of the underlying algorithms is analyzed and the computational efficiency resulting from the decomposition operation of the network into communities is highlighted.Findings–The numerical results demonstrate that the use of dynamic shortest path(time dependency)as a metric has a significant impact on the identification of critical nodes/links and the experiments conducted on real world networks highlight the importance of sensitive links to dynamically detect critical links and elaborate smart transport plans.Research limitations/implications–The research in this paper also revealed several challenges,which call for future investigations.First,the authors have restricted our experimentation to a small network where the only focus is on the model behavior,in the absence of historical data.The authors intend to extend this study to very large network using real data.Second,the authors have considered only congestion to assess network’s criticality;future research on this topic may include other factors,mainly vulnerability.Practical implications–Taking into consideration the dynamic and stochastic nature in problem modeling enables to be effective tools for real-time control of transportation networks.This leads to design optimized smart transport plans particularly in disaster management,to improve the emergency evacuation effeciency.Originality/value–The paper provides a novel approach to solve critical nodes/links detection problems.In contrast to the majority of research works in the literature,the proposed model considers dynamicity and betweennesswhiletakingintoaccount the stochasticaspectof transportnetworks.Thisenables theapproach to guide the traffic and analyze transport networks mainly under disaster conditions in which networks become highly dynamic.展开更多
文摘In this paper we discuss the discrete, time non--homogeneous discounted Markovian decisionprogramming, where the state space and all action sets are countable. Suppose that the optimumvalue function is finite. We give the necessary and sufficient conditions for the existence of anoptimal policy. Suppose that the absolute mean of rewards is relatively bounded. We also give thenecessary and sufficient conditions for the existence of an optimal policy.
基金The project is supported by National Natural Science Foundation of China
文摘In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case.
基金This work was supported in part by the National Natural Science Foundation of China(61922076,61725304,61873252,61991403,61991400)in part by the Australian Research Council Discovery Program(DP200101199).
文摘This paper studies price-based residential demand response management(PB-RDRM)in smart grids,in which non-dispatchable and dispatchable loads(including general loads and plug-in electric vehicles(PEVs))are both involved.The PB-RDRM is composed of a bi-level optimization problem,in which the upper-level dynamic retail pricing problem aims to maximize the profit of a utility company(UC)by selecting optimal retail prices(RPs),while the lower-level demand response(DR)problem expects to minimize the comprehensive cost of loads by coordinating their energy consumption behavior.The challenges here are mainly two-fold:1)the uncertainty of energy consumption and RPs;2)the flexible PEVs’temporally coupled constraints,which make it impossible to directly develop a model-based optimization algorithm to solve the PB-RDRM.To address these challenges,we first model the dynamic retail pricing problem as a Markovian decision process(MDP),and then employ a model-free reinforcement learning(RL)algorithm to learn the optimal dynamic RPs of UC according to the loads’responses.Our proposed RL-based DR algorithm is benchmarked against two model-based optimization approaches(i.e.,distributed dual decomposition-based(DDB)method and distributed primal-dual interior(PDI)-based method),which require exact load and electricity price models.The comparison results show that,compared with the benchmark solutions,our proposed algorithm can not only adaptively decide the RPs through on-line learning processes,but also achieve larger social welfare within an unknown electricity market environment.
文摘The present research is based upon a comprehensive survey which discusses the slightly tolerable water level of Balaton between 2000 and 2003. The low water level of the extreme period caused considerable problems in recreation. Our goal was to investigate the possible water transfer policies and the water level regulation policy of Lake Balaton by applying the dynamic programming of Markov chains. This iteration supports the cost-benefit analysis of different scenarios and also provides information about the best water governing policy. As a basis of our scientific analysis, Markov chains were created by ARMA (autoregressive moving average) synthetic data generator. Profit was joined to each transition-probability for the economic analysis. In our case the profit was negative, because the harmful effects of the low water level should be estimated, which is based on the calculated willingness-to-pay for improving the water quality of Lake Balaton. In addition, the profit includes the cost of different water supplement scenarios. After computer programming, the method proved to be an efficient tool to buttress the cost-benefit analysis of water supplement scenarios. The result highlights the importance of further climate change monitoring. Calculation confirmed water transfer to be cost-effective, yet scenarios with less ecological risk are also effective, thus preferable.
基金acknowledgment to Dr Ali Benssam for his invaluable support during all the steps of the project and in the writing of the paper.
文摘Purpose–The purpose of this paper is to study a multiple-origin-multiple-destination variant of dynamic critical nodes detection problem(DCNDP)and dynamic critical links detection problem(DCLDP)in stochastic networks.DCNDP and DCLDP consist of identifying the subset of nodes and links,respectively,whose deletion maximizes the stochastic shortest paths between all origins–destinations pairs,in the graph modeling the transport network.The identification of such nodes(or links)helps to better control the road traffic and predict the necessary measures to avoid congestion.Design/methodology/approach–A Markovian decision process is used to model the shortest path problem underdynamic trafficconditions.Effectivealgorithmstodeterminethe criticalnodes(links)whileconsideringthe dynamicity of the traffic network are provided.Also,sensitivity analysis toward capacity reduction for critical links is studied.Moreover,the complexity of the underlying algorithms is analyzed and the computational efficiency resulting from the decomposition operation of the network into communities is highlighted.Findings–The numerical results demonstrate that the use of dynamic shortest path(time dependency)as a metric has a significant impact on the identification of critical nodes/links and the experiments conducted on real world networks highlight the importance of sensitive links to dynamically detect critical links and elaborate smart transport plans.Research limitations/implications–The research in this paper also revealed several challenges,which call for future investigations.First,the authors have restricted our experimentation to a small network where the only focus is on the model behavior,in the absence of historical data.The authors intend to extend this study to very large network using real data.Second,the authors have considered only congestion to assess network’s criticality;future research on this topic may include other factors,mainly vulnerability.Practical implications–Taking into consideration the dynamic and stochastic nature in problem modeling enables to be effective tools for real-time control of transportation networks.This leads to design optimized smart transport plans particularly in disaster management,to improve the emergency evacuation effeciency.Originality/value–The paper provides a novel approach to solve critical nodes/links detection problems.In contrast to the majority of research works in the literature,the proposed model considers dynamicity and betweennesswhiletakingintoaccount the stochasticaspectof transportnetworks.Thisenables theapproach to guide the traffic and analyze transport networks mainly under disaster conditions in which networks become highly dynamic.