In order to apply overbooking idea in Chinese railway freight industry to improve revenue, a Markov decision process(dynamic programming) model for railway freight reservation was formulated and the overbooking limit ...In order to apply overbooking idea in Chinese railway freight industry to improve revenue, a Markov decision process(dynamic programming) model for railway freight reservation was formulated and the overbooking limit level was proposed as a control policy. However, computing the dynamic programming treatment needs six nested loops and this will be burdensome for real-world problems. To break through the calculation limit, the properties of value function were analyzed and the overbooking protection level was proposed to reduce the calculating quantity. The simulation experiments show that the overbooking protection level for the lower-fare class is higher than that for the higher-fare class, so the overbooking strategy is nested by fare class. Besides, by analyzing the influence on the overbooking strategy of freight arrival probability and cancellation probability, the proposed approach is efficient and also has a good application prospect in reality. Also, compared with the existing reservation(FCFS), the overbooking strategy performs better in the fields of vacancy reduction and revenue improvement.展开更多
Due to the problem of spectrum underuti-lization and energy inefficiency in wireless commu-nications, the research on energy efficient Cogni-tive Radio Networks (CRNs) has received signifi-cant attention in both ind...Due to the problem of spectrum underuti-lization and energy inefficiency in wireless commu-nications, the research on energy efficient Cogni-tive Radio Networks (CRNs) has received signifi-cant attention in both industry and academia. In this paper, we consider the problem of optimal spectrum selection and transmission parameters de-sign with the objective of minimizing energy con-sumption in CRNs. Since the system state cannot be directly observed due to miss detections and estimation errors, we formulate the optimal spec-trum access problem as a Partially Observable Markov Decision Process (POMDP). In particular, the proposed scheme selects the optimal spectrum, modulation and coding scheme, transmission pow-er, and link layer frame size in each time slot ac-cording to the belief state, which captures all the history information of past actions and observa- tions. The optimal policy can be acquired by sol-ving POMDP problem with linear programming based algorithm Sinmlation results show that sig-nificant energy savings can be achieved by the proposed scheme.展开更多
To address the issue of resource scarcity in wireless communication, a novel dynamic call admission control scheme for wireless mobile network was proposed. The scheme established a reward computing model of call admi...To address the issue of resource scarcity in wireless communication, a novel dynamic call admission control scheme for wireless mobile network was proposed. The scheme established a reward computing model of call admission of wireless cell based on Markov decision process, dynamically optimized call admission process according to the principle of maximizing the average system rewards. Extensive simulations were conducted to examine the performance of the model by comparing with other policies in terms of new call blocking probability, handoff call dropping probability and resource utilization rate. Experimental results show that the proposed scheme can achieve better adaptability to changes in traffic conditions than existing protocols. Under high call traffic load, handoff call dropping probability and new call blocking probability can be reduced by about 8%, and resource utilization rate can be improved by 2%-6%. The proposed scheme can achieve high source utilization rate of about 85%.展开更多
In Cognitive Radio(CR)networks,there is a common assumption that secondary devices always obey commands and are under full control.However,this assumption may become unrealistic for future CR networks with more intell...In Cognitive Radio(CR)networks,there is a common assumption that secondary devices always obey commands and are under full control.However,this assumption may become unrealistic for future CR networks with more intelligent,sophisticated and autonomous devices.Imperfect spectrum sensing and illegal behaviour of secondary users can result in harmful interference to primary users.In this paper,we propose a novel concept of Proactive-Optimization CR(POCR)networks,in which highly intelligent secondary users always try to proactively consider potentially harmful interference when making their behaviour decision.Furthermore,we propose an optimal transmission behaviour decision scheme for secondary users in POCR networks considering the possible harmful interference and penalties from primary users.Specifically,we formulate the system as a Partially-Observable Markov Decision Process(POMDP)problem.With this formulation,a low-complexity dynamic programming framework is presented to obtain the optimal behaviour policy.Extensive simulation results are presented to illustrate the significant performance improvement of the proposed scheme compared with the existing one that ignores the proactive-optimization of secondary users.展开更多
In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent doma...In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent domains, convergence of the Q values is guaranteed only when an agent behaves optimally and its opponents’ strategies satisfy certain conditions, and an agent can get best learning performances when it adopts the same learning algorithm as that of its opponents.展开更多
Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decisi...Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning.展开更多
In order to improve driver convenience, electronic tilt & tele column is applied to a full sized car. To operate electronic tilt & tele, it needs two motors and one electronic controller. Because of high cost compon...In order to improve driver convenience, electronic tilt & tele column is applied to a full sized car. To operate electronic tilt & tele, it needs two motors and one electronic controller. Because of high cost component parts, it is difficult to apply to a midsize car. Meanwhile, to cope with regulations of fuel efficiency and emission, motor driven power steering system is applied to a full sized car from a small car. But MDPS (Markov decision processes) also consist of high cost component parts (motor and electronic controller). This paper proposed the MDPS motor-driven electronic tilt & tele column system which has single motor and an integrated electronic controller and introduced the detailed design study and evaluation results.展开更多
In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that a...In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that adopts Markov decision process (MDP) theory to plan the driving force with explicit representation of the uncertainty during excavation. The shield attitudes of possi- ble world and driving forces during excavation are scattered as a state set and an action set, respectively. In particular, an evaluation function is proposed with consideration of the stability of driving force and the deviation of shield attitude. Unlike the deterministic approach, the driving forces based on MDP model lead to an uncertain effect and the attitude is known only with an imprecise probability. We consider the case that the transition probability varies in a given domain estimated by field data, and discuss the optimal policy based on the interval arithmetic. The validity of the approach is discussed by comparing the driving force planning with the actual operating data from the field records of Line 9 in Tianjin. It is proved that the MDP model is reasonable enough to predict the driving force for automatic deviation rectifying.展开更多
The optimal energy management for a plug-in hybrid electric bus(PHEB)running along the fixed city bus route is an important technique to improve the vehicles’fuel economy and reduce the bus emission.Considering the i...The optimal energy management for a plug-in hybrid electric bus(PHEB)running along the fixed city bus route is an important technique to improve the vehicles’fuel economy and reduce the bus emission.Considering the inherently high regularities of the fixed bus routes,the continuous state Markov decision process(MDP)is adopted to describe a cost function as total gas and electric consumption fee.Then a learning algorithm is proposed to construct such a MDP model without knowing the all parameters of the MDP.Next,fitted value iteration algorithm is given to approximate the cost function,and linear regression is used in this fitted value iteration.Simulation results show that this approach is feasible in searching for the control strategy of PHEB.Simultaneously this method has its own advantage comparing with the CDCS mode.Furthermore,a test based on a real PHEB was carried out to verify the applicable of the proposed method.展开更多
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar...This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.展开更多
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable...This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.展开更多
基金Project(2010QZZD021)supported by the Fundamental Research Funds for the Central Universities,ChinaProject(2015F024)supported by China Railway Science and Technology Research Development Program
文摘In order to apply overbooking idea in Chinese railway freight industry to improve revenue, a Markov decision process(dynamic programming) model for railway freight reservation was formulated and the overbooking limit level was proposed as a control policy. However, computing the dynamic programming treatment needs six nested loops and this will be burdensome for real-world problems. To break through the calculation limit, the properties of value function were analyzed and the overbooking protection level was proposed to reduce the calculating quantity. The simulation experiments show that the overbooking protection level for the lower-fare class is higher than that for the higher-fare class, so the overbooking strategy is nested by fare class. Besides, by analyzing the influence on the overbooking strategy of freight arrival probability and cancellation probability, the proposed approach is efficient and also has a good application prospect in reality. Also, compared with the existing reservation(FCFS), the overbooking strategy performs better in the fields of vacancy reduction and revenue improvement.
基金supported by the National Natural Science Foundation of China under Grant No. 61101107the Scientific Research and Innovation Plan for the Youth of BUP Tunder Grant No. 2011RC0305the State Major Science and Technology Special Projects under Grant No.2012ZX03004001
文摘Due to the problem of spectrum underuti-lization and energy inefficiency in wireless commu-nications, the research on energy efficient Cogni-tive Radio Networks (CRNs) has received signifi-cant attention in both industry and academia. In this paper, we consider the problem of optimal spectrum selection and transmission parameters de-sign with the objective of minimizing energy con-sumption in CRNs. Since the system state cannot be directly observed due to miss detections and estimation errors, we formulate the optimal spec-trum access problem as a Partially Observable Markov Decision Process (POMDP). In particular, the proposed scheme selects the optimal spectrum, modulation and coding scheme, transmission pow-er, and link layer frame size in each time slot ac-cording to the belief state, which captures all the history information of past actions and observa- tions. The optimal policy can be acquired by sol-ving POMDP problem with linear programming based algorithm Sinmlation results show that sig-nificant energy savings can be achieved by the proposed scheme.
基金Project(60873082) supported by the National Natural Science Foundation of ChinaProject(09C794) supported by the Natural Science Foundation of Education Department of Hunan Province, China+1 种基金Project (S2008FJ3078) supported by the Science and Technology Program Foundation of Hunan Province, ChinaProject(07JJ6109) supported by the Natural Science Foundation of Hunan Province, China
文摘To address the issue of resource scarcity in wireless communication, a novel dynamic call admission control scheme for wireless mobile network was proposed. The scheme established a reward computing model of call admission of wireless cell based on Markov decision process, dynamically optimized call admission process according to the principle of maximizing the average system rewards. Extensive simulations were conducted to examine the performance of the model by comparing with other policies in terms of new call blocking probability, handoff call dropping probability and resource utilization rate. Experimental results show that the proposed scheme can achieve better adaptability to changes in traffic conditions than existing protocols. Under high call traffic load, handoff call dropping probability and new call blocking probability can be reduced by about 8%, and resource utilization rate can be improved by 2%-6%. The proposed scheme can achieve high source utilization rate of about 85%.
基金supported in part by the National Natural Science Foundation of China under Grants No. 61101113,No. 61072088,No.61201198the Beijing Natural Science Foundation under Grants No. 4132007,No. 4132015,No. 4132019the Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20111103120017
文摘In Cognitive Radio(CR)networks,there is a common assumption that secondary devices always obey commands and are under full control.However,this assumption may become unrealistic for future CR networks with more intelligent,sophisticated and autonomous devices.Imperfect spectrum sensing and illegal behaviour of secondary users can result in harmful interference to primary users.In this paper,we propose a novel concept of Proactive-Optimization CR(POCR)networks,in which highly intelligent secondary users always try to proactively consider potentially harmful interference when making their behaviour decision.Furthermore,we propose an optimal transmission behaviour decision scheme for secondary users in POCR networks considering the possible harmful interference and penalties from primary users.Specifically,we formulate the system as a Partially-Observable Markov Decision Process(POMDP)problem.With this formulation,a low-complexity dynamic programming framework is presented to obtain the optimal behaviour policy.Extensive simulation results are presented to illustrate the significant performance improvement of the proposed scheme compared with the existing one that ignores the proactive-optimization of secondary users.
文摘In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent domains, convergence of the Q values is guaranteed only when an agent behaves optimally and its opponents’ strategies satisfy certain conditions, and an agent can get best learning performances when it adopts the same learning algorithm as that of its opponents.
基金We also acknowledge the support by the National Natural Science Foundation of China (Grant No. 60574071).
文摘Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning.
文摘In order to improve driver convenience, electronic tilt & tele column is applied to a full sized car. To operate electronic tilt & tele, it needs two motors and one electronic controller. Because of high cost component parts, it is difficult to apply to a midsize car. Meanwhile, to cope with regulations of fuel efficiency and emission, motor driven power steering system is applied to a full sized car from a small car. But MDPS (Markov decision processes) also consist of high cost component parts (motor and electronic controller). This paper proposed the MDPS motor-driven electronic tilt & tele column system which has single motor and an integrated electronic controller and introduced the detailed design study and evaluation results.
基金supported by the National Basic Research Program (973 Program) of China (Grant No. 2007CB714000)
文摘In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that adopts Markov decision process (MDP) theory to plan the driving force with explicit representation of the uncertainty during excavation. The shield attitudes of possi- ble world and driving forces during excavation are scattered as a state set and an action set, respectively. In particular, an evaluation function is proposed with consideration of the stability of driving force and the deviation of shield attitude. Unlike the deterministic approach, the driving forces based on MDP model lead to an uncertain effect and the attitude is known only with an imprecise probability. We consider the case that the transition probability varies in a given domain estimated by field data, and discuss the optimal policy based on the interval arithmetic. The validity of the approach is discussed by comparing the driving force planning with the actual operating data from the field records of Line 9 in Tianjin. It is proved that the MDP model is reasonable enough to predict the driving force for automatic deviation rectifying.
基金supported by the National Natural Science Foundation of China(Grant No.51275557)the National Science-technology Support Plan Projects of China(Grant No.2013BAG14B01)
文摘The optimal energy management for a plug-in hybrid electric bus(PHEB)running along the fixed city bus route is an important technique to improve the vehicles’fuel economy and reduce the bus emission.Considering the inherently high regularities of the fixed bus routes,the continuous state Markov decision process(MDP)is adopted to describe a cost function as total gas and electric consumption fee.Then a learning algorithm is proposed to construct such a MDP model without knowing the all parameters of the MDP.Next,fitted value iteration algorithm is given to approximate the cost function,and linear regression is used in this fitted value iteration.Simulation results show that this approach is feasible in searching for the control strategy of PHEB.Simultaneously this method has its own advantage comparing with the CDCS mode.Furthermore,a test based on a real PHEB was carried out to verify the applicable of the proposed method.
基金supported by the National Natural Science Foundation of China under Grant Nos.61374080 and 61374067the Natural Science Foundation of Zhejiang Province under Grant No.LY12F03010+1 种基金the Natural Science Foundation of Ningbo under Grant No.2012A610032Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions
文摘This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.
基金supported by the National Natural Science Foundation of China under Grant Nos.10925107 and 60874004
文摘This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.