This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space...This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.展开更多
Although small cell offloading technology can alleviate the congestion in macrocell, aggressively offloading data traffic from macrocell to small cell can also degrade the performance of small cell due to the heavy lo...Although small cell offloading technology can alleviate the congestion in macrocell, aggressively offloading data traffic from macrocell to small cell can also degrade the performance of small cell due to the heavy load. Because of collision and backoff, the degradation is significant especially in network with contention-based channel access, and finally decreases throughput of the whole network. To find an optimal fraction of traffic to be offloaded in heterogeneous network, we combine Markov chain with the Poisson point process model to analyze contention-based throughput in irregularly deployment networks. Then we derive the close-form solution of the throughput and find that it is a function of the transmit power and density of base stations.Based on this, we propose the load-aware offloading strategies via power control and base station density adjustment. The numerical results verify our analysis and show a great performance gain compared with non-load-aware offloading.展开更多
Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential te...Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective.展开更多
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar...This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.展开更多
An effort to model the dynamic optimal advertising was made with the uncertainty of sales responses in consideration. The problem of dynamic advertising was depicted as a Markov decision process with two state variabl...An effort to model the dynamic optimal advertising was made with the uncertainty of sales responses in consideration. The problem of dynamic advertising was depicted as a Markov decision process with two state variables. When a firm launches an advertising campaign, it may predict the probability that the campaign will obtain the sales réponse. This probability was chosen as one state variable. Cumulative sales volume was chosen as another state variable which varies randomly with advertising. The only decision variable was advertising expenditure. With these variables, a multi-stage Markov decision process model was formulat ed. On the basis of some propositions the model was analyzed. Some analytical results about the optimal strategy have been derived, and their practical implications have been explained.展开更多
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable...This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.展开更多
In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more ...In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more general model of blockchain selfish mining with both a two-block leading competitive criterion and a new economic incentive mechanism.Then we establish a pyramid Markov process and show that it is irreducible and positive recurrent,and its stationary probability vector is matrix-geometric with an explicitly representable rate matrix.Also,we use the stationary probability vector to study the influence of orphan blocks on the waste of computing resource.Next,we set up a pyramid Markov reward process to investigate the long-run average mining profits of the honest and dishonest mining pools,respectively.As a by-product,we build one-dimensional Markov reward processes and provide some new interesting interpretation on the Markov chain and the revenue analysis reported in the seminal work by Eyal and Sirer(2014).Note that the pyramid Markov(reward)processes can open up a new avenue in the study of blockchain selfish mining.Thus we hope that the methodology and results developed in this paper shed light on the blockchain selfish mining such that a series of promising research can be developed potentially.展开更多
At any given time, a product stock manager is expected to carry out activities to check his or her holdings in general and to monitor the condition of the stock in particular. He should monitor the level or quantity a...At any given time, a product stock manager is expected to carry out activities to check his or her holdings in general and to monitor the condition of the stock in particular. He should monitor the level or quantity available of a given product, of any item. On the basis of the observation made in relation to the movements of previous periods, he may decide to order or not a certain quantity of products. This paper discusses the applicability of discrete-time Markov chains in making relevant decisions for the management of a stock of COTRA-Honey products. A Markov chain model based on the transition matrix and equilibrium probabilities was developed to help managers predict the likely state of the stock in order to anticipate procurement decisions in the short, medium or long term. The objective of any manager is to ensure efficient management by limiting overstocking, minimising the risk of stock-outs as much as possible and maximising profits. The determined Markov chain model allows the manager to predict whether or not to order for the period following the current period, and if so, how much.展开更多
Due to various advantages in storage and implementation, simple strategies are usually preferred than complex strategies when the performances are close. Strategy optimization for controlled Markov process with descri...Due to various advantages in storage and implementation, simple strategies are usually preferred than complex strategies when the performances are close. Strategy optimization for controlled Markov process with descriptive complexity constraint provides a general framework for many such problems. In this paper, we first show by examples that the descriptive complexity and the performance of a strategy could be independent, and use the F-matrix in the No-Free-Lunch Theorem to show the risk that approximating complex strategies may lead to simple strategies that are unboundedly worse in cardinal performance than the original complex strategies. We then develop a method that handles the descriptive complexity constraint directly, which describes simple strategies exactly and only approximates complex strategies during the optimization. The ordinal performance difference between the resulting strategies of this selective approximation method and the global optimum is quantified. Numerical examples on an engine maintenance problem show how this method improves the solution quality. We hope this work sheds some insights to solving general strategy optimization for controlled Markov process with descriptive complexity constraint.展开更多
文摘This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.
基金supported by the National High-Tech R&D Program (863 Program) under grant No. 2015AA01A705Beijing Municipal Science and Technology Commission research fund project under grant No. D151100000115002+1 种基金China Scholarship Council under grant No. 201406470038BUPT youth scientific research innovation program under grant No. 500401238
文摘Although small cell offloading technology can alleviate the congestion in macrocell, aggressively offloading data traffic from macrocell to small cell can also degrade the performance of small cell due to the heavy load. Because of collision and backoff, the degradation is significant especially in network with contention-based channel access, and finally decreases throughput of the whole network. To find an optimal fraction of traffic to be offloaded in heterogeneous network, we combine Markov chain with the Poisson point process model to analyze contention-based throughput in irregularly deployment networks. Then we derive the close-form solution of the throughput and find that it is a function of the transmit power and density of base stations.Based on this, we propose the load-aware offloading strategies via power control and base station density adjustment. The numerical results verify our analysis and show a great performance gain compared with non-load-aware offloading.
基金supported by the National Natural Science Foundation of China (51175502)
文摘Testing is the premise and foundation of realizing equipment health management (EHM). To address the problem that the static periodic test strategy may cause deficient test or excessive test, a dynamic sequential test strategy (DSTS) for EHM is presented. Considering the situation that equipment health state is not completely observable in reality, a DSTS optimization method based on partially observable semi-Markov decision pro- cess (POSMDP) is proposed. Firstly, an equipment health state degradation model is constructed by Markov process, and the control limit maintenance policy is also introduced. Secondly, POSMDP is formulated in great detail. And then, POSMDP is converted to completely observable belief semi-Markov decision process (BSMDP) through belief state. The optimal equation and the corresponding optimal DSTS, which minimize the long-run ex- pected average cost per unit time, are obtained with BSMDP. The results of application in complex equipment show that the proposed DSTS is feasible and effective.
基金supported by the National Natural Science Foundation of China under Grant Nos.61374080 and 61374067the Natural Science Foundation of Zhejiang Province under Grant No.LY12F03010+1 种基金the Natural Science Foundation of Ningbo under Grant No.2012A610032Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions
文摘This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.
基金This work was supported by the National Natural Science Foundation(No.70271021).
文摘An effort to model the dynamic optimal advertising was made with the uncertainty of sales responses in consideration. The problem of dynamic advertising was depicted as a Markov decision process with two state variables. When a firm launches an advertising campaign, it may predict the probability that the campaign will obtain the sales réponse. This probability was chosen as one state variable. Cumulative sales volume was chosen as another state variable which varies randomly with advertising. The only decision variable was advertising expenditure. With these variables, a multi-stage Markov decision process model was formulat ed. On the basis of some propositions the model was analyzed. Some analytical results about the optimal strategy have been derived, and their practical implications have been explained.
基金supported by the National Natural Science Foundation of China under Grant Nos.10925107 and 60874004
文摘This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.
基金This work is supported by the National Key R&D Program of China under Grant No.2020AAA0103801Quanlin Li is supported by the National Natural Science Foundation of China under Grant Nos.71671158 and 71932002+1 种基金the Beijing Social Science Foundation Research Base Project under Grant No.19JDGLA004Xiaole Wu is supported by the National Natural Science Foundation of China under Grant No.72025102.
文摘In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more general model of blockchain selfish mining with both a two-block leading competitive criterion and a new economic incentive mechanism.Then we establish a pyramid Markov process and show that it is irreducible and positive recurrent,and its stationary probability vector is matrix-geometric with an explicitly representable rate matrix.Also,we use the stationary probability vector to study the influence of orphan blocks on the waste of computing resource.Next,we set up a pyramid Markov reward process to investigate the long-run average mining profits of the honest and dishonest mining pools,respectively.As a by-product,we build one-dimensional Markov reward processes and provide some new interesting interpretation on the Markov chain and the revenue analysis reported in the seminal work by Eyal and Sirer(2014).Note that the pyramid Markov(reward)processes can open up a new avenue in the study of blockchain selfish mining.Thus we hope that the methodology and results developed in this paper shed light on the blockchain selfish mining such that a series of promising research can be developed potentially.
文摘At any given time, a product stock manager is expected to carry out activities to check his or her holdings in general and to monitor the condition of the stock in particular. He should monitor the level or quantity available of a given product, of any item. On the basis of the observation made in relation to the movements of previous periods, he may decide to order or not a certain quantity of products. This paper discusses the applicability of discrete-time Markov chains in making relevant decisions for the management of a stock of COTRA-Honey products. A Markov chain model based on the transition matrix and equilibrium probabilities was developed to help managers predict the likely state of the stock in order to anticipate procurement decisions in the short, medium or long term. The objective of any manager is to ensure efficient management by limiting overstocking, minimising the risk of stock-outs as much as possible and maximising profits. The determined Markov chain model allows the manager to predict whether or not to order for the period following the current period, and if so, how much.
基金Supported by the National Natural Science Foundation of China (Grant Nos. 60274011, 60574067, 60704008, 60736027, 60721003, 90924001)the New Century Excellent Talents in University (Grant No. NCET-04-0094)+1 种基金the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20070003110)the Programme of Introducing Talents of Discipline to Universities (the National 111 International Collaboration Projects) (Grant No. B06002)
文摘Due to various advantages in storage and implementation, simple strategies are usually preferred than complex strategies when the performances are close. Strategy optimization for controlled Markov process with descriptive complexity constraint provides a general framework for many such problems. In this paper, we first show by examples that the descriptive complexity and the performance of a strategy could be independent, and use the F-matrix in the No-Free-Lunch Theorem to show the risk that approximating complex strategies may lead to simple strategies that are unboundedly worse in cardinal performance than the original complex strategies. We then develop a method that handles the descriptive complexity constraint directly, which describes simple strategies exactly and only approximates complex strategies during the optimization. The ordinal performance difference between the resulting strategies of this selective approximation method and the global optimum is quantified. Numerical examples on an engine maintenance problem show how this method improves the solution quality. We hope this work sheds some insights to solving general strategy optimization for controlled Markov process with descriptive complexity constraint.