A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load cur...A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented.展开更多
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi...This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions.展开更多
This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space...This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.展开更多
This article explores controllable Borel spaces, stationary, homogeneous Markov processes, discrete time with infinite horizon, with bounded cost functions and using the expected total discounted cost criterion. The p...This article explores controllable Borel spaces, stationary, homogeneous Markov processes, discrete time with infinite horizon, with bounded cost functions and using the expected total discounted cost criterion. The problem of the estimation of stability for this type of process is set. The central objective is to obtain a bounded stability index expressed in terms of the Lévy-Prokhorov metric;likewise, sufficient conditions are provided for the existence of such inequalities.展开更多
In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary poli...In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary policies and for the value functions, the stability index is explicitly calculated and through statistical techniques its asymptotic behavior is investigated (using numerical experiments) when the discount coefficient approaches 1. The results obtained define the conditions under which an approximate optimal stationary policy can be used to control the original process.展开更多
In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criter...In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces. The corresponding transition rates are allowed to be unbounded, and the cost rates may have neither upper nor lower bounds. Under some mild hypotheses, we prove the existence of (ε〉 0)-ASPC optimal stationary policies based on two different approaches: one is the "optimality equation" approach and the other is the "two optimality inequalities" approach.展开更多
This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞o...This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞of CTMDP with denumerable states and a sequence {M_n} of CTMDP with finite states, we give a new convergence condition to ensure that the optimal values and optimal policies of {M_n} converge to the optimal value and optimal policy of M_∞as the state space Snof Mnconverges to the state space S_∞of M_∞, respectively. The transition rates and cost/reward functions of M_∞are allowed to be unbounded. Our approach can be viewed as a combination method of linear program and Lagrange multipliers.展开更多
This article,we develop an optimal policy to control the service rate of a discrete time queueing-inventory system with finite buffer.The customers arrive according to a Bernoulli process and the service time for the ...This article,we develop an optimal policy to control the service rate of a discrete time queueing-inventory system with finite buffer.The customers arrive according to a Bernoulli process and the service time for the customers are geometric.Whenever the buffer size attains its maximum,any arriving new customers are considered to be lost.The customers are served one by one according to FCFS rule and each customers request random number of items.The inventory is replenished according to a(s,Q)inventory policy with geometric lead time.The main objectives of this article are to determine the service rates to be employed at each slot so that the long run expected cost rate is minimized for fixed inventory level and fixed buffer size and to minimize the expected waiting time for a fixed inventory level and fixed buffer size.The problems are modelled as Markov decision problem.We establish the existence of a stationary policy and employ linear programming method to find the optimal service rates.We provide some numerical examples to illustrate the behaviour of the model.展开更多
Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly...Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly design sampling and non-slot based scheduling policies to minimize the maximum time-average age of information(MAoI)among sensors with the constraints of average energy cost and finite queue stability.To overcome the intractability involving high couplings of such a complex stochastic process,we first focus on the single-sensor time-average AoI optimization problem and convert the constrained Markov decision process(CMDP)into an unconstrained Markov decision process(MDP)by the Lagrangian method.With the infinite-time average energy and AoI expression expended as the Bellman equation,the singlesensor time-average AoI optimization problem can be approached through the steady-state distribution probability.Further,we propose a low-complexity sub-optimal sampling and semi-distributed scheduling scheme for the multi-sensor scenario.The simulation results show that the proposed scheme reduces the MAoI significantly while achieving a balance between the sampling rate and service rate for multiple sensors.展开更多
This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expect...This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expected discounted and average criteria in the existing literature,we consider the risk probability that the total rewards produced by a system do not exceed a prescribed goal during a first passage time to some target set,and aim to find a policy that minimizes the risk probability over the class of all history-dependent policies.Under suitable conditions,we derive the optimality equation(OE)for the probability criterion,prove that the value function of the minimization problem is the unique solution to the OE,and establish the existence ofε(≥0)-optimal policies.Finally,we provide two examples to illustrate our results.展开更多
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-d...This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.展开更多
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analy...This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation.展开更多
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a f...This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.展开更多
文摘A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented.
基金supported by the National Natural Science Foundation of China(10801056)the Natural Science Foundation of Ningbo(2010A610094)
文摘This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions.
文摘This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.
文摘This article explores controllable Borel spaces, stationary, homogeneous Markov processes, discrete time with infinite horizon, with bounded cost functions and using the expected total discounted cost criterion. The problem of the estimation of stability for this type of process is set. The central objective is to obtain a bounded stability index expressed in terms of the Lévy-Prokhorov metric;likewise, sufficient conditions are provided for the existence of such inequalities.
文摘In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary policies and for the value functions, the stability index is explicitly calculated and through statistical techniques its asymptotic behavior is investigated (using numerical experiments) when the discount coefficient approaches 1. The results obtained define the conditions under which an approximate optimal stationary policy can be used to control the original process.
基金Supported by the National Natural Science Foundation of China (No.10801056)the Natural Science Foundation of Ningbo (No. 2010A610094)K.C. Wong Magna Fund in Ningbo University
文摘In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces. The corresponding transition rates are allowed to be unbounded, and the cost rates may have neither upper nor lower bounds. Under some mild hypotheses, we prove the existence of (ε〉 0)-ASPC optimal stationary policies based on two different approaches: one is the "optimality equation" approach and the other is the "two optimality inequalities" approach.
文摘This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞of CTMDP with denumerable states and a sequence {M_n} of CTMDP with finite states, we give a new convergence condition to ensure that the optimal values and optimal policies of {M_n} converge to the optimal value and optimal policy of M_∞as the state space Snof Mnconverges to the state space S_∞of M_∞, respectively. The transition rates and cost/reward functions of M_∞are allowed to be unbounded. Our approach can be viewed as a combination method of linear program and Lagrange multipliers.
基金The research of Ms.L.Iniya is supported by the DST-INSPIRE Fellowship,New Delhi,research award No.DST/INSPIRE Fellowship/[IF190092].
文摘This article,we develop an optimal policy to control the service rate of a discrete time queueing-inventory system with finite buffer.The customers arrive according to a Bernoulli process and the service time for the customers are geometric.Whenever the buffer size attains its maximum,any arriving new customers are considered to be lost.The customers are served one by one according to FCFS rule and each customers request random number of items.The inventory is replenished according to a(s,Q)inventory policy with geometric lead time.The main objectives of this article are to determine the service rates to be employed at each slot so that the long run expected cost rate is minimized for fixed inventory level and fixed buffer size and to minimize the expected waiting time for a fixed inventory level and fixed buffer size.The problems are modelled as Markov decision problem.We establish the existence of a stationary policy and employ linear programming method to find the optimal service rates.We provide some numerical examples to illustrate the behaviour of the model.
基金supported in part by the National Key R&D Program of China(No.2021YFB3300100)the National Natural Science Foundation of China(No.62171062)。
文摘Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly design sampling and non-slot based scheduling policies to minimize the maximum time-average age of information(MAoI)among sensors with the constraints of average energy cost and finite queue stability.To overcome the intractability involving high couplings of such a complex stochastic process,we first focus on the single-sensor time-average AoI optimization problem and convert the constrained Markov decision process(CMDP)into an unconstrained Markov decision process(MDP)by the Lagrangian method.With the infinite-time average energy and AoI expression expended as the Bellman equation,the singlesensor time-average AoI optimization problem can be approached through the steady-state distribution probability.Further,we propose a low-complexity sub-optimal sampling and semi-distributed scheduling scheme for the multi-sensor scenario.The simulation results show that the proposed scheme reduces the MAoI significantly while achieving a balance between the sampling rate and service rate for multiple sensors.
基金supported by the National Natural Science Foundation of China(Nos.11931018,11961005)Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University(No.2020B1212060032)the Natural Science Foundation of Guangxi Province(No.2020GXNSFAA297196)。
文摘This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expected discounted and average criteria in the existing literature,we consider the risk probability that the total rewards produced by a system do not exceed a prescribed goal during a first passage time to some target set,and aim to find a policy that minimizes the risk probability over the class of all history-dependent policies.Under suitable conditions,we derive the optimality equation(OE)for the probability criterion,prove that the value function of the minimization problem is the unique solution to the OE,and establish the existence ofε(≥0)-optimal policies.Finally,we provide two examples to illustrate our results.
基金This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61374067, 41271076).
文摘This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.
基金supported by National Natural Science Foundation of China (Grant Nos. 61374067 and 41271076)
文摘This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation.
基金Supported by the Natural Science Foundation of China(No.60874004,60736028)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme(2010)
文摘This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes.