期刊文献+
共找到84篇文章
< 1 2 5 >
每页显示 20 50 100
Modeling and Design of Real-Time Pricing Systems Based on Markov Decision Processes 被引量:4
1
作者 Koichi Kobayashi Ichiro Maruta +1 位作者 Kazunori Sakurama Shun-ichi Azuma 《Applied Mathematics》 2014年第10期1485-1495,共11页
A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load cur... A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented. 展开更多
关键词 markov decision process OPTIMAL Control REAL-time PRICING System
下载PDF
Variance minimization for continuous-time Markov decision processes: two approaches 被引量:1
2
作者 ZHU Quan-xin 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2010年第4期400-410,共11页
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi... This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 展开更多
关键词 Continuous-time markov decision process Polish space variance minimization optimality equation optimality inequality.
下载PDF
Variance Optimization for Continuous-Time Markov Decision Processes
3
作者 Yaqing Fu 《Open Journal of Statistics》 2019年第2期181-195,共15页
This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space... This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper. 展开更多
关键词 CONTINUOUS-time markov decision process Variance OPTIMALITY of Average REWARD Optimal POLICY of Variance POLICY ITERATION
下载PDF
Stability Estimation for Markov Control Processes with Discounted Cost 被引量:1
4
作者 Jaime Eduardo Martínez-Sánchez 《Applied Mathematics》 2020年第6期491-509,共19页
This article explores controllable Borel spaces, stationary, homogeneous Markov processes, discrete time with infinite horizon, with bounded cost functions and using the expected total discounted cost criterion. The p... This article explores controllable Borel spaces, stationary, homogeneous Markov processes, discrete time with infinite horizon, with bounded cost functions and using the expected total discounted cost criterion. The problem of the estimation of stability for this type of process is set. The central objective is to obtain a bounded stability index expressed in terms of the Lévy-Prokhorov metric;likewise, sufficient conditions are provided for the existence of such inequalities. 展开更多
关键词 discrete-time markov Control process Expected Total Discounted Cost Stability Index Probabilistic Metric Lévy-Prokhorov Metric
下载PDF
Asymptotic Evaluations of the Stability Index for a Markov Control Process with the Expected Total Discounted Reward Criterion
5
作者 Jaime Eduardo Martínez-Sánchez 《American Journal of Operations Research》 2021年第1期62-85,共24页
In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary poli... In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary policies and for the value functions, the stability index is explicitly calculated and through statistical techniques its asymptotic behavior is investigated (using numerical experiments) when the discount coefficient approaches 1. The results obtained define the conditions under which an approximate optimal stationary policy can be used to control the original process. 展开更多
关键词 Control Consumption-Investment process discrete-time markov Control process Expected Total Discounted Reward Probabilistic Metrics Stability Index Estimation
下载PDF
Markov决策过程不确定策略特征模式 被引量:2
6
作者 黄镇谨 陆阳 +1 位作者 杨娟 方欢 《计算机科学》 CSCD 北大核心 2013年第4期263-266,共4页
马尔科夫决策过程可以建模具有不确定性特征的复杂系统,而在进行模型分析时需要采用策略对不确定性进行处理。首先,研究不同策略下时空有界可达概率问题,给出不确定性解决策略的定义及分类方法。其次,在时间无关策略下,证明基于确定性... 马尔科夫决策过程可以建模具有不确定性特征的复杂系统,而在进行模型分析时需要采用策略对不确定性进行处理。首先,研究不同策略下时空有界可达概率问题,给出不确定性解决策略的定义及分类方法。其次,在时间无关策略下,证明基于确定性选取动作和随机选取动作的时空有界可达概率的一致性,并且论证了时间依赖策略相对于时间无关策略具有更好的时空有界可达概率。最后结合实例简要阐述了结论的正确性。 展开更多
关键词 马尔科夫决策过程 不确定性策略 时空有界可达概率
下载PDF
随机模型检测连续时间Markov过程 被引量:2
7
作者 钮俊 曾国荪 +1 位作者 吕新荣 徐畅 《计算机科学》 CSCD 北大核心 2011年第9期112-115,125,共5页
功能正确和性能可满足是复杂系统可信要求非常重要的两个方面。从定性验证和定量分析相结合的角度,对复杂并发系统进行功能验证和性能分析,统一地评估系统是否可信。连续时间Markov决策过程CTMDP(Continu-ous-time Markov decision proc... 功能正确和性能可满足是复杂系统可信要求非常重要的两个方面。从定性验证和定量分析相结合的角度,对复杂并发系统进行功能验证和性能分析,统一地评估系统是否可信。连续时间Markov决策过程CTMDP(Continu-ous-time Markov decision process)能够统一刻画复杂系统的概率选择、随机时间及不确定性等重要特征。提出用CT-MDP作为系统定性验证和定量分析模型,将复杂系统的功能验证和性能分析转化为CTMDP中的可达概率求解,并证明验证过程的正确性,最终借助模型检测器MRMC(Markov Reward Model Checker)实现模型检测。理论分析表明,提出的针对CTMDP模型的验证需求是必要的,验证思路和方法具有可行性。 展开更多
关键词 功能性能 连续时间markov决策过程 模型检测 可信验证 可达概率
下载PDF
基于离散Markov决策过程的发电公司多阶段决策 被引量:2
8
作者 张宏刚 宋依群 《上海交通大学学报》 EI CAS CSCD 北大核心 2004年第8期1238-1240,1245,共4页
采用离散时间Markov决策过程(DTMDP)对以多阶段总利润最优为目标的发电公司决策问题进行研究.市场环境下,发电公司根据自身条件,其竞争策略可以是价格的接受者,也可以是价格的制定者.考虑了发电公司不同策略情况下市场均衡状态间的转换... 采用离散时间Markov决策过程(DTMDP)对以多阶段总利润最优为目标的发电公司决策问题进行研究.市场环境下,发电公司根据自身条件,其竞争策略可以是价格的接受者,也可以是价格的制定者.考虑了发电公司不同策略情况下市场均衡状态间的转换概率,分别给出了发电公司作为价格接受者和价格制定者时的多阶段决策模型.通过算例验证了所提模型的有效性和可行性. 展开更多
关键词 电力市场 离散时间markov决策过程 决策问题
下载PDF
基于时间策略的连续时间Markov过程验证 被引量:1
9
作者 黄镇谨 陈波 欧阳浩 《广西科技大学学报》 CAS 2014年第3期59-62,86,共5页
对系统模型进行验证是保证系统安全的一个关键.连续时间Markov过程可以刻画复杂并发系统的随机、概率、不确定性特征.提出时间依赖策略下连续时间Markov过程验证方法,将连续时间Markov过程转换成为交互式马尔科夫链,给出模型的转换方法... 对系统模型进行验证是保证系统安全的一个关键.连续时间Markov过程可以刻画复杂并发系统的随机、概率、不确定性特征.提出时间依赖策略下连续时间Markov过程验证方法,将连续时间Markov过程转换成为交互式马尔科夫链,给出模型的转换方法及不确定性选择策略的转换方法,最终通过求解交互式马尔科夫链的时间可达概率最值实现对连续时间Markov过程模型的验证.理论分析表明,提出的方法具有可行性. 展开更多
关键词 马尔科夫决策过程 交互式马尔科夫链 时间有界可达概率 时间策略
下载PDF
基于性能势的Markov控制过程双时间尺度仿真算法
10
作者 鲍秉坤 殷保群 奚宏生 《系统仿真学报》 CAS CSCD 北大核心 2009年第13期4114-4119,共6页
在基于性能势的随机逼近方法中引入双时间尺度的概念,提出了离散时间Markov控制过程的基于性能势的双时间尺度仿真梯度算法,弥补了传统算法中每步更新算法更新频率过快和更新环更新算法更新频率过慢的不足,并利用三个数值例子来说明双... 在基于性能势的随机逼近方法中引入双时间尺度的概念,提出了离散时间Markov控制过程的基于性能势的双时间尺度仿真梯度算法,弥补了传统算法中每步更新算法更新频率过快和更新环更新算法更新频率过慢的不足,并利用三个数值例子来说明双时间尺度更新算法在计算复杂度、收敛速度和收敛精度上的优势。 展开更多
关键词 markov控制过程 性能势 双时间尺度 随机逼近
下载PDF
Average Sample-path Optimality for Continuous-time Markov Decision Processes in Polish Spaces
11
作者 Quan-xin ZHU 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2011年第4期613-624,共12页
In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criter... In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces. The corresponding transition rates are allowed to be unbounded, and the cost rates may have neither upper nor lower bounds. Under some mild hypotheses, we prove the existence of (ε〉 0)-ASPC optimal stationary policies based on two different approaches: one is the "optimality equation" approach and the other is the "two optimality inequalities" approach. 展开更多
关键词 continuous-time markov decision process average sample-path optimality Polish space optimality equation optimality inequality
原文传递
CONVERGENCE OF CONTROLLED MODELS FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES WITH CONSTRAINED AVERAGE CRITERIA
12
作者 Wenzhao Zhang Xianzhu Xiong 《Annals of Applied Mathematics》 2019年第4期449-464,共16页
This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞o... This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞of CTMDP with denumerable states and a sequence {M_n} of CTMDP with finite states, we give a new convergence condition to ensure that the optimal values and optimal policies of {M_n} converge to the optimal value and optimal policy of M_∞as the state space Snof Mnconverges to the state space S_∞of M_∞, respectively. The transition rates and cost/reward functions of M_∞are allowed to be unbounded. Our approach can be viewed as a combination method of linear program and Lagrange multipliers. 展开更多
关键词 continuous-time markov decision processes optimal value optimal policies constrained average criteria occupation measures
原文传递
Optimal Control of Service Rates of Discrete-Time(s,Q)Queueing-Inventory Systems with Finite Buffer
13
作者 L.Iniya B.Sivakumar G.Arivarignan 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2024年第3期261-280,共20页
This article,we develop an optimal policy to control the service rate of a discrete time queueing-inventory system with finite buffer.The customers arrive according to a Bernoulli process and the service time for the ... This article,we develop an optimal policy to control the service rate of a discrete time queueing-inventory system with finite buffer.The customers arrive according to a Bernoulli process and the service time for the customers are geometric.Whenever the buffer size attains its maximum,any arriving new customers are considered to be lost.The customers are served one by one according to FCFS rule and each customers request random number of items.The inventory is replenished according to a(s,Q)inventory policy with geometric lead time.The main objectives of this article are to determine the service rates to be employed at each slot so that the long run expected cost rate is minimized for fixed inventory level and fixed buffer size and to minimize the expected waiting time for a fixed inventory level and fixed buffer size.The problems are modelled as Markov decision problem.We establish the existence of a stationary policy and employ linear programming method to find the optimal service rates.We provide some numerical examples to illustrate the behaviour of the model. 展开更多
关键词 Queueing-inventory system discrete time batch demand-markov decision process
原文传递
Age-Driven Joint Sampling and Non-Slot Based Scheduling for Industrial Internet of Things
14
作者 Cao Yali Teng Yinglei +1 位作者 Song Mei Wang Nan 《China Communications》 SCIE CSCD 2024年第11期190-204,共15页
Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly... Effective control of time-sensitive industrial applications depends on the real-time transmission of data from underlying sensors.Quantifying the data freshness through age of information(AoI),in this paper,we jointly design sampling and non-slot based scheduling policies to minimize the maximum time-average age of information(MAoI)among sensors with the constraints of average energy cost and finite queue stability.To overcome the intractability involving high couplings of such a complex stochastic process,we first focus on the single-sensor time-average AoI optimization problem and convert the constrained Markov decision process(CMDP)into an unconstrained Markov decision process(MDP)by the Lagrangian method.With the infinite-time average energy and AoI expression expended as the Bellman equation,the singlesensor time-average AoI optimization problem can be approached through the steady-state distribution probability.Further,we propose a low-complexity sub-optimal sampling and semi-distributed scheduling scheme for the multi-sensor scenario.The simulation results show that the proposed scheme reduces the MAoI significantly while achieving a balance between the sampling rate and service rate for multiple sensors. 展开更多
关键词 Age of Information(AoI) Industrial Internet of Things(IIoT) markov decision process(MDP) time sensitive systems URLLC
下载PDF
First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes 被引量:1
15
作者 Xin WEN Hai-feng HUO Xian-ping GUO 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2022年第3期549-567,共19页
This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expect... This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expected discounted and average criteria in the existing literature,we consider the risk probability that the total rewards produced by a system do not exceed a prescribed goal during a first passage time to some target set,and aim to find a policy that minimizes the risk probability over the class of all history-dependent policies.Under suitable conditions,we derive the optimality equation(OE)for the probability criterion,prove that the value function of the minimization problem is the unique solution to the OE,and establish the existence ofε(≥0)-optimal policies.Finally,we provide two examples to illustrate our results. 展开更多
关键词 piecewise deterministic markov decision processes risk probability first passage time ε-optimal policy
原文传递
First passage Markov decision processes with constraints and varying discount factors 被引量:2
16
作者 Xiao WU Xiaolong ZOU Xianping GUO 《Frontiers of Mathematics in China》 SCIE CSCD 2015年第4期1005-1023,共19页
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-d... This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 展开更多
关键词 discrete-time markov decision process (DTMDP) constrainedoptimality varying discount factor unbounded cost
原文传递
Convergence of Markov decision processes with constraints and state-action dependent discount factors 被引量:2
17
作者 Xiao Wu Xianping Guo 《Science China Mathematics》 SCIE CSCD 2020年第1期167-182,共16页
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analy... This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation. 展开更多
关键词 discrete-time markov decision processes state-action dependent discount factors unbounded costs CONVERGENCE
原文传递
First Passage Models for Denumerable Semi-Markov Decision Processes with Nonnegative Discounted Costs 被引量:2
18
作者 Yong-hui Huang Xian-ping Guo 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2011年第2期177-190,共14页
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a f... This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes. 展开更多
关键词 Semi-markov decision processes target set first passage time discounted cost optimal policy
原文传递
互联电网CPS调节指令动态最优分配Q–学习算法 被引量:25
19
作者 余涛 王宇名 刘前进 《中国电机工程学报》 EI CSCD 北大核心 2010年第7期62-69,共8页
控制性能标准(control performance standard,CPS)下互联电网调度端的自动发电控制(automatic generation control,AGC)指令(简称CPS指令)到各类型AGC机组的动态优化分配是随机最优问题。将CPS指令分配的连续控制过程离散化,并可将其看... 控制性能标准(control performance standard,CPS)下互联电网调度端的自动发电控制(automatic generation control,AGC)指令(简称CPS指令)到各类型AGC机组的动态优化分配是随机最优问题。将CPS指令分配的连续控制过程离散化,并可将其看作是一个离散时间马尔可夫决策过程,提出应用基于Q–学习的动态控制方法。根据优化目标的差异,设计不同的奖励函数,并将其引入到算法当中,有效结合水、火电机组的调节特性,并考虑水电机组的调节裕度,提高AGC系统调节能力。遗传算法和工程实用方法在标准两区域模型及南方电网模型的仿真研究显示,Q–学习有效提高了系统的适应性、鲁棒性和CPS考核合格率。 展开更多
关键词 Q-学习 随机最优 离散时间马尔可夫决策过程 控制性能标准 自动发电控制
下载PDF
动态武器目标分配问题的研究现状与展望 被引量:46
20
作者 刘传波 邱志明 +1 位作者 吴玲 王航宇 《电光与控制》 北大核心 2010年第11期43-48,共6页
动态武器目标分配(DWTA)是现代指控系统亟待解决的重要理论问题,由于时间因素和随机事件的影响,使得解决该问题的复杂程度进一步增加。在介绍DWTA问题研究基本内容的基础上,重点归纳和分析了目前解决DW-TA问题的一系列方法,包括分阶段... 动态武器目标分配(DWTA)是现代指控系统亟待解决的重要理论问题,由于时间因素和随机事件的影响,使得解决该问题的复杂程度进一步增加。在介绍DWTA问题研究基本内容的基础上,重点归纳和分析了目前解决DW-TA问题的一系列方法,包括分阶段求解法、马尔可夫决策过程及anytime算法等,提出了现阶段对于DWTA问题研究的不足和未来尚需解决的问题,并指出在充分考虑时空约束的基础上,寻求一种具有任意时间特性且能灵活处理随机事件的智能算法是解决DWTA问题的有效途径。 展开更多
关键词 动态武器目标分配 马尔可夫决策过程 时间窗 ANYtime算法
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部