期刊文献+
共找到12篇文章
< 1 >
每页显示 20 50 100
具有能量约束的线性二次高斯系统的最优控制与能量调度
1
作者 杨修远 《理论数学》 2024年第4期375-383,共9页
本文考虑了一种带有能量收集约束的能量传输反馈线性二次高斯(LQG)系统的最优控制器合成。该系统面临着选择传输能量给控制器以及每个能量操作成本和能量收集约束的选择。目标是共同选择传输能量和控制器,以在控制性能和成本之间保持最... 本文考虑了一种带有能量收集约束的能量传输反馈线性二次高斯(LQG)系统的最优控制器合成。该系统面临着选择传输能量给控制器以及每个能量操作成本和能量收集约束的选择。目标是共同选择传输能量和控制器,以在控制性能和成本之间保持最佳平衡。在一定的假设下,这个问题可以分解为两个优化问题:一个用于最优控制器合成,另一个用于最优传输能量选择。最优控制器合成子问题由Riccati方程描述,而最优传输能量选择策略则通过解决某个马尔可夫决策过程(MDP)来找到。最后,我们通过仿真验证了上述方法的有效性。 展开更多
关键词 LQG最优控制 最优能量传输 马尔可夫–决策过程
下载PDF
A dynamic model for railway freight overbooking 被引量:2
2
作者 冯芬玲 张佳琪 郭晓峰 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第8期3257-3264,共8页
In order to apply overbooking idea in Chinese railway freight industry to improve revenue, a Markov decision process(dynamic programming) model for railway freight reservation was formulated and the overbooking limit ... In order to apply overbooking idea in Chinese railway freight industry to improve revenue, a Markov decision process(dynamic programming) model for railway freight reservation was formulated and the overbooking limit level was proposed as a control policy. However, computing the dynamic programming treatment needs six nested loops and this will be burdensome for real-world problems. To break through the calculation limit, the properties of value function were analyzed and the overbooking protection level was proposed to reduce the calculating quantity. The simulation experiments show that the overbooking protection level for the lower-fare class is higher than that for the higher-fare class, so the overbooking strategy is nested by fare class. Besides, by analyzing the influence on the overbooking strategy of freight arrival probability and cancellation probability, the proposed approach is efficient and also has a good application prospect in reality. Also, compared with the existing reservation(FCFS), the overbooking strategy performs better in the fields of vacancy reduction and revenue improvement. 展开更多
关键词 revenue management railway freight OVERBOOKING dynamic model
下载PDF
Energy Aware Spectrum Access in Cognitive Radio Networks with Imperfect Channel Sensing 被引量:2
3
作者 Wei Yifei Du Chenhui +1 位作者 Feng Ruijun F. Richard Yu 《China Communications》 SCIE CSCD 2012年第9期127-130,共4页
Due to the problem of spectrum underuti-lization and energy inefficiency in wireless commu-nications, the research on energy efficient Cogni-tive Radio Networks (CRNs) has received signifi-cant attention in both ind... Due to the problem of spectrum underuti-lization and energy inefficiency in wireless commu-nications, the research on energy efficient Cogni-tive Radio Networks (CRNs) has received signifi-cant attention in both industry and academia. In this paper, we consider the problem of optimal spectrum selection and transmission parameters de-sign with the objective of minimizing energy con-sumption in CRNs. Since the system state cannot be directly observed due to miss detections and estimation errors, we formulate the optimal spec-trum access problem as a Partially Observable Markov Decision Process (POMDP). In particular, the proposed scheme selects the optimal spectrum, modulation and coding scheme, transmission pow-er, and link layer frame size in each time slot ac-cording to the belief state, which captures all the history information of past actions and observa- tions. The optimal policy can be acquired by sol-ving POMDP problem with linear programming based algorithm Sinmlation results show that sig-nificant energy savings can be achieved by the proposed scheme. 展开更多
关键词 spectrum access cognitive radio POMDP
下载PDF
A novel dynamic call admission control policy for wireless network 被引量:1
4
作者 黄国盛 陈志刚 +2 位作者 李庆华 赵明 郭真 《Journal of Central South University》 SCIE EI CAS 2010年第1期110-116,共7页
To address the issue of resource scarcity in wireless communication, a novel dynamic call admission control scheme for wireless mobile network was proposed. The scheme established a reward computing model of call admi... To address the issue of resource scarcity in wireless communication, a novel dynamic call admission control scheme for wireless mobile network was proposed. The scheme established a reward computing model of call admission of wireless cell based on Markov decision process, dynamically optimized call admission process according to the principle of maximizing the average system rewards. Extensive simulations were conducted to examine the performance of the model by comparing with other policies in terms of new call blocking probability, handoff call dropping probability and resource utilization rate. Experimental results show that the proposed scheme can achieve better adaptability to changes in traffic conditions than existing protocols. Under high call traffic load, handoff call dropping probability and new call blocking probability can be reduced by about 8%, and resource utilization rate can be improved by 2%-6%. The proposed scheme can achieve high source utilization rate of about 85%. 展开更多
关键词 wireless network call admission control quality of service Markov decision process
下载PDF
Optimal Transmission Behaviour Policies of Secondary Users in Proactive-Optimization Cognitive Radio Networks 被引量:2
5
作者 司鹏搏 于非 +1 位作者 王慧琪 张延华 《China Communications》 SCIE CSCD 2013年第8期1-17,共17页
In Cognitive Radio(CR)networks,there is a common assumption that secondary devices always obey commands and are under full control.However,this assumption may become unrealistic for future CR networks with more intell... In Cognitive Radio(CR)networks,there is a common assumption that secondary devices always obey commands and are under full control.However,this assumption may become unrealistic for future CR networks with more intelligent,sophisticated and autonomous devices.Imperfect spectrum sensing and illegal behaviour of secondary users can result in harmful interference to primary users.In this paper,we propose a novel concept of Proactive-Optimization CR(POCR)networks,in which highly intelligent secondary users always try to proactively consider potentially harmful interference when making their behaviour decision.Furthermore,we propose an optimal transmission behaviour decision scheme for secondary users in POCR networks considering the possible harmful interference and penalties from primary users.Specifically,we formulate the system as a Partially-Observable Markov Decision Process(POMDP)problem.With this formulation,a low-complexity dynamic programming framework is presented to obtain the optimal behaviour policy.Extensive simulation results are presented to illustrate the significant performance improvement of the proposed scheme compared with the existing one that ignores the proactive-optimization of secondary users. 展开更多
关键词 CR proactive-optimization user behaviour POMDP
下载PDF
Optimal Response Learning and Its Convergence in Multiagent Domains
6
作者 张化祥 黄上腾 乐嘉锦 《Journal of Donghua University(English Edition)》 EI CAS 2005年第3期116-119,共4页
In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent doma... In multiagent reinforcement learning, with different assumptions of the opponents’ policies, an agent adopts quite different learning rules, and gets different learning performances. We prove that, in multiagent domains, convergence of the Q values is guaranteed only when an agent behaves optimally and its opponents’ strategies satisfy certain conditions, and an agent can get best learning performances when it adopts the same learning algorithm as that of its opponents. 展开更多
关键词 MULTIAGENT LEARNING POLICY
下载PDF
A Novel Dynamic Decision Model in 2-player Symmetric Repeated Games
7
作者 Liu Weibing Wang Xianjia Wang Guangmin 《Engineering Sciences》 EI 2008年第1期43-46,共4页
Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decisi... Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning. 展开更多
关键词 game theory evolutionary game repeated game Markov process decision model
下载PDF
Electric Tilt & Telescopic Column Driven by EPS Motor
8
作者 Daesuk Jung Soobo Park +2 位作者 Sungjin Moon Minkwon Kim Sungtaeg Oh 《Journal of Mechanics Engineering and Automation》 2013年第10期627-631,共5页
In order to improve driver convenience, electronic tilt & tele column is applied to a full sized car. To operate electronic tilt & tele, it needs two motors and one electronic controller. Because of high cost compon... In order to improve driver convenience, electronic tilt & tele column is applied to a full sized car. To operate electronic tilt & tele, it needs two motors and one electronic controller. Because of high cost component parts, it is difficult to apply to a midsize car. Meanwhile, to cope with regulations of fuel efficiency and emission, motor driven power steering system is applied to a full sized car from a small car. But MDPS (Markov decision processes) also consist of high cost component parts (motor and electronic controller). This paper proposed the MDPS motor-driven electronic tilt & tele column system which has single motor and an integrated electronic controller and introduced the detailed design study and evaluation results. 展开更多
关键词 EPS system tilt telescopic controller electromagnetic clutch.
下载PDF
Driving force planning in shield tunneling based on Markov decision processes 被引量:7
9
作者 HU XiangTao HUANG YongAn +1 位作者 YIN ZhouPing XIONG YouLun 《Science China(Technological Sciences)》 SCIE EI CAS 2012年第4期1022-1030,共9页
In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that a... In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that adopts Markov decision process (MDP) theory to plan the driving force with explicit representation of the uncertainty during excavation. The shield attitudes of possi- ble world and driving forces during excavation are scattered as a state set and an action set, respectively. In particular, an evaluation function is proposed with consideration of the stability of driving force and the deviation of shield attitude. Unlike the deterministic approach, the driving forces based on MDP model lead to an uncertain effect and the attitude is known only with an imprecise probability. We consider the case that the transition probability varies in a given domain estimated by field data, and discuss the optimal policy based on the interval arithmetic. The validity of the approach is discussed by comparing the driving force planning with the actual operating data from the field records of Line 9 in Tianjin. It is proved that the MDP model is reasonable enough to predict the driving force for automatic deviation rectifying. 展开更多
关键词 shield tunneling Markov decision process automatic deviation rectifying interval arithmetic driving force planning
原文传递
A learning method for energy optimization of the plug-in hybrid electric bus 被引量:7
10
作者 SUN Yong CHEN Zheng +1 位作者 YAN BingJie YOU SiXiong 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2015年第7期1242-1249,共8页
The optimal energy management for a plug-in hybrid electric bus(PHEB)running along the fixed city bus route is an important technique to improve the vehicles’fuel economy and reduce the bus emission.Considering the i... The optimal energy management for a plug-in hybrid electric bus(PHEB)running along the fixed city bus route is an important technique to improve the vehicles’fuel economy and reduce the bus emission.Considering the inherently high regularities of the fixed bus routes,the continuous state Markov decision process(MDP)is adopted to describe a cost function as total gas and electric consumption fee.Then a learning algorithm is proposed to construct such a MDP model without knowing the all parameters of the MDP.Next,fitted value iteration algorithm is given to approximate the cost function,and linear regression is used in this fitted value iteration.Simulation results show that this approach is feasible in searching for the control strategy of PHEB.Simultaneously this method has its own advantage comparing with the CDCS mode.Furthermore,a test based on a real PHEB was carried out to verify the applicable of the proposed method. 展开更多
关键词 plug-in hybrid electric (PHEB) control strategy dynamic programming (DP) learning algorithm
原文传递
STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES 被引量:1
11
作者 ZHU Quanxin GUO Xianping 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2014年第5期1045-1063,共19页
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar... This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet. 展开更多
关键词 Continuous-time Markov decision process expected average reward criterion finite-horizon optimality Polish space strong n-discount optimality
原文传递
TOTAL REWARD CRITERIA FOR UNCONSTRAINED/CONSTRAINED CONTINUOUS-TIME MARKOV DECISION PROCESSES
12
作者 Xianping GUO Lanlan ZHANG 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2011年第3期491-505,共15页
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable... This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems. 展开更多
关键词 Constrained-optimal policy continuous-time Markov decision process optimal policy total reward criterion unbounded reward/cost and transition rates.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部