In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary poli...In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary policies and for the value functions, the stability index is explicitly calculated and through statistical techniques its asymptotic behavior is investigated (using numerical experiments) when the discount coefficient approaches 1. The results obtained define the conditions under which an approximate optimal stationary policy can be used to control the original process.展开更多
This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space...This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.展开更多
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri...A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.展开更多
This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that ...This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.展开更多
为提高个性化推荐技术的准确率,首先在多维半马氏过程的状态空间中定义'空状态',得到扩展多维半马氏过程,将其与社会网络分析理论结合,得到社会网络信息流模型,该模型描述了社会网络成员间的信息流动过程。然后基于社会网络信...为提高个性化推荐技术的准确率,首先在多维半马氏过程的状态空间中定义'空状态',得到扩展多维半马氏过程,将其与社会网络分析理论结合,得到社会网络信息流模型,该模型描述了社会网络成员间的信息流动过程。然后基于社会网络信息流模型,提出协同过滤算法SMRR(Semi-Markov and reward renewal)。实验表明,由于综合考虑用户自身偏好和社会网络中其他成员的影响,SMRR的预测准确率明显高于原有算法。展开更多
In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more ...In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more general model of blockchain selfish mining with both a two-block leading competitive criterion and a new economic incentive mechanism.Then we establish a pyramid Markov process and show that it is irreducible and positive recurrent,and its stationary probability vector is matrix-geometric with an explicitly representable rate matrix.Also,we use the stationary probability vector to study the influence of orphan blocks on the waste of computing resource.Next,we set up a pyramid Markov reward process to investigate the long-run average mining profits of the honest and dishonest mining pools,respectively.As a by-product,we build one-dimensional Markov reward processes and provide some new interesting interpretation on the Markov chain and the revenue analysis reported in the seminal work by Eyal and Sirer(2014).Note that the pyramid Markov(reward)processes can open up a new avenue in the study of blockchain selfish mining.Thus we hope that the methodology and results developed in this paper shed light on the blockchain selfish mining such that a series of promising research can be developed potentially.展开更多
文摘In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary policies and for the value functions, the stability index is explicitly calculated and through statistical techniques its asymptotic behavior is investigated (using numerical experiments) when the discount coefficient approaches 1. The results obtained define the conditions under which an approximate optimal stationary policy can be used to control the original process.
文摘This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.
基金partially supported by Nation Science Foundation of China (61661025, 61661026)Foundation of A hundred Youth Talents Training Program of Lanzhou Jiaotong University (152022)
文摘A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.
文摘This paper investigates the Borel state space semi-Markov decision process (SMDP) with the criterion of expected total rewards in a semi-Markov environment. It describes a system which behaves like a SMDP except that the system is influenced by its environment modeled by a semi-Markov process. We transform the SMDP in a semiMarkov environment into an equivalent discrete time Markov decision process under the condition that rewards are all positive or all negative, and obtain the optimality equation and some properties for it.
文摘为提高个性化推荐技术的准确率,首先在多维半马氏过程的状态空间中定义'空状态',得到扩展多维半马氏过程,将其与社会网络分析理论结合,得到社会网络信息流模型,该模型描述了社会网络成员间的信息流动过程。然后基于社会网络信息流模型,提出协同过滤算法SMRR(Semi-Markov and reward renewal)。实验表明,由于综合考虑用户自身偏好和社会网络中其他成员的影响,SMRR的预测准确率明显高于原有算法。
基金This work is supported by the National Key R&D Program of China under Grant No.2020AAA0103801Quanlin Li is supported by the National Natural Science Foundation of China under Grant Nos.71671158 and 71932002+1 种基金the Beijing Social Science Foundation Research Base Project under Grant No.19JDGLA004Xiaole Wu is supported by the National Natural Science Foundation of China under Grant No.72025102.
文摘In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more general model of blockchain selfish mining with both a two-block leading competitive criterion and a new economic incentive mechanism.Then we establish a pyramid Markov process and show that it is irreducible and positive recurrent,and its stationary probability vector is matrix-geometric with an explicitly representable rate matrix.Also,we use the stationary probability vector to study the influence of orphan blocks on the waste of computing resource.Next,we set up a pyramid Markov reward process to investigate the long-run average mining profits of the honest and dishonest mining pools,respectively.As a by-product,we build one-dimensional Markov reward processes and provide some new interesting interpretation on the Markov chain and the revenue analysis reported in the seminal work by Eyal and Sirer(2014).Note that the pyramid Markov(reward)processes can open up a new avenue in the study of blockchain selfish mining.Thus we hope that the methodology and results developed in this paper shed light on the blockchain selfish mining such that a series of promising research can be developed potentially.