In this paper, we consider constrained denumerable state non-stationary Markov decision processes (MDPs, for short) with expected total reward criterion. By the mechanics of intro- ducing Lagrange multiplier and using...In this paper, we consider constrained denumerable state non-stationary Markov decision processes (MDPs, for short) with expected total reward criterion. By the mechanics of intro- ducing Lagrange multiplier and using the methods of probability and analytics, we prove the existence of constrained optimal policies. Moreover, we prove that a constrained optimal policy may be a Markov policy, or be a randomized Markov policy that randomizes between two Markov policies, that differ in only one state.展开更多
Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly...Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly design sampling and non-slot based scheduling policies to minimize the maximum time-average age of information(MAoI)among sensors with the constraints of average energy cost and finite queue stability.To overcome the intractability involving high couplings of such a complex stochastic process,we first focus on the single-sensor time-average AoI optimization problem and convert the constrained Markov decision process(CMDP)into an unconstrained Markov decision process(MDP)by the Lagrangian method.With the infinite-time average energy and AoI expression expended as the Bellman equation,the singlesensor time-average AoI optimization problem can be approached through the steady-state distribution probability.Further,we propose a low-complexity sub-optimal sampling and semi-distributed scheduling scheme for the multi-sensor scenario.The simulation results show that the proposed scheme reduces the MAoI significantly while achieving a balance between the sampling rate and service rate for multiple sensors.展开更多
基金the National Natural Science Foundation of China !19901038by Natural Science Foundation of Guangdong Province and by Found
文摘In this paper, we consider constrained denumerable state non-stationary Markov decision processes (MDPs, for short) with expected total reward criterion. By the mechanics of intro- ducing Lagrange multiplier and using the methods of probability and analytics, we prove the existence of constrained optimal policies. Moreover, we prove that a constrained optimal policy may be a Markov policy, or be a randomized Markov policy that randomizes between two Markov policies, that differ in only one state.
基金supported in part by the National Key R&D Program of China(No.2021YFB3300100)the National Natural Science Foundation of China(No.62171062)。
文摘Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly design sampling and non-slot based scheduling policies to minimize the maximum time-average age of information(MAoI)among sensors with the constraints of average energy cost and finite queue stability.To overcome the intractability involving high couplings of such a complex stochastic process,we first focus on the single-sensor time-average AoI optimization problem and convert the constrained Markov decision process(CMDP)into an unconstrained Markov decision process(MDP)by the Lagrangian method.With the infinite-time average energy and AoI expression expended as the Bellman equation,the singlesensor time-average AoI optimization problem can be approached through the steady-state distribution probability.Further,we propose a low-complexity sub-optimal sampling and semi-distributed scheduling scheme for the multi-sensor scenario.The simulation results show that the proposed scheme reduces the MAoI significantly while achieving a balance between the sampling rate and service rate for multiple sensors.