期刊文献+
共找到107篇文章
< 1 2 6 >
每页显示 20 50 100
Robust analysis of discounted Markov decision processes with uncertain transition probabilities 被引量:2
1
作者 LOU Zhen-kai HOU Fu-jun LOU Xu-ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2020年第4期417-436,共20页
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob... Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods. 展开更多
关键词 Markov decision processes uncertain transition probabilities robustness and sensitivity robust optimal policy value interval
下载PDF
Variance minimization for continuous-time Markov decision processes: two approaches 被引量:1
2
作者 ZHU Quan-xin 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2010年第4期400-410,共11页
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi... This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 展开更多
关键词 Continuous-time Markov decision process Polish space variance minimization optimality equation optimality inequality.
下载PDF
Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach
3
作者 Qianrong Shen 《Journal of Computer and Communications》 2021年第12期80-97,共18页
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ... In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%. 展开更多
关键词 Ride-on-Demand Service Markov decision process Dynamic Pricing Taxi Services Route Recommendation
下载PDF
A dynamical neural network approach for distributionally robust chance-constrained Markov decision process 被引量:1
4
作者 Tian Xia Jia Liu Zhiping Chen 《Science China Mathematics》 SCIE CSCD 2024年第6期1395-1418,共24页
In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms und... In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach. 展开更多
关键词 Markov decision process chance constraints distributionally robust optimization moment-based ambiguity set dynamical neural network
原文传递
Rationale for Decision-Making Processes in Enhancement of Community Participation for Sustainable Mangrove Management in Lamu, Kenya
5
作者 Jamila Ahmed Bessy Kathambi Robert Kibugi 《Open Journal of Ecology》 2023年第6期409-421,共13页
Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiv... Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiveness of mangrove as an ecosystem, and thus the attendant socio-economic and governance ramifications, causes the idea of decision making to become relatively distinct from other decision making process As a result, the purpose of this research was to evaluate the impact that community engagement plays in the decision-making process as it relates to the establishment of governance norms for sustainable mangrove management in Lamu County. In this study, a correlational research design was applied, and the researchers employed a mixed techniques approach. The target population was 296 respondents. The research used questionnaires and interviews to collect data. A descriptive statistical technique was utilized to perform an inspection and analysis on the data that was gathered. The findings indicated that having awareness about governance standards is beneficial during the process of making decisions. In addition, the findings demonstrated that respondents had the impression that the decision-making process was not done properly. On the other hand, the participants pointed out the positive aspects of the decision-making process and agreed that the participation of both gender was essential for the sustainable management of mangroves. Based on these data, it appeared that full community engagement in decision-making is necessary for sustainable management of mangrove forests. 展开更多
关键词 Community Engagement SUSTAINABILITY decision Making process Lamu
下载PDF
A Comparative Analysis of Visualization Methods in Architecture:Employing Virtual Reality to Support the Decision-Making Process in the Architecture,Engineering,and Construction Industry
6
作者 Ahmed Redha Gheraba Debajyoti Pati +4 位作者 Clifford B.Fedler Marcelo Schmidt Michael S.Molina Ali Nejat Muge Mukaddes Darwish 《Journal of Civil Engineering and Architecture》 2023年第2期73-89,共17页
The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most cr... The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design. 展开更多
关键词 Virtual reality construction change orders architectural visualization decision making process construction management construction technology interior environmental design
下载PDF
Driving force planning in shield tunneling based on Markov decision processes 被引量:7
7
作者 HU XiangTao HUANG YongAn +1 位作者 YIN ZhouPing XIONG YouLun 《Science China(Technological Sciences)》 SCIE EI CAS 2012年第4期1022-1030,共9页
In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that a... In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that adopts Markov decision process (MDP) theory to plan the driving force with explicit representation of the uncertainty during excavation. The shield attitudes of possi- ble world and driving forces during excavation are scattered as a state set and an action set, respectively. In particular, an evaluation function is proposed with consideration of the stability of driving force and the deviation of shield attitude. Unlike the deterministic approach, the driving forces based on MDP model lead to an uncertain effect and the attitude is known only with an imprecise probability. We consider the case that the transition probability varies in a given domain estimated by field data, and discuss the optimal policy based on the interval arithmetic. The validity of the approach is discussed by comparing the driving force planning with the actual operating data from the field records of Line 9 in Tianjin. It is proved that the MDP model is reasonable enough to predict the driving force for automatic deviation rectifying. 展开更多
关键词 shield tunneling Markov decision process automatic deviation rectifying interval arithmetic driving force planning
原文传递
A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates 被引量:3
8
作者 HUANG XiangXiang ZOU XiaoLong GUO XianPing 《Science China Mathematics》 SCIE CSCD 2015年第9期1923-1938,共16页
This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total l... This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system. 展开更多
关键词 semi-Markov decision processes loss rate risk probability first passage time optimal policy iteration algorithm
原文传递
First Passage Models for Denumerable Semi-Markov Decision Processes with Nonnegative Discounted Costs 被引量:2
9
作者 Yong-hui Huang Xian-ping Guo 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2011年第2期177-190,共14页
This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a f... This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs. The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set. We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy. Then, we prove that the value function satisfies the optimality equation and there exists an optimal (or ε-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach. Further we give some properties of optimal policies. In addition, a value iteration algorithm for computing the value function and optimal policies is developed and an example is given. Finally, it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes. 展开更多
关键词 Semi-Markov decision processes target set first passage time discounted cost optimal policy
原文传递
Optimal Policies for Quantum Markov Decision Processes 被引量:2
10
作者 Ming-Sheng Ying Yuan Feng Sheng-Gang Ying 《International Journal of Automation and computing》 EI CSCD 2021年第3期410-421,共12页
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro... Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world. 展开更多
关键词 Quantum Markov decision processes quantum machine learning reinforcement learning dynamic programming decision making
原文传递
First passage Markov decision processes with constraints and varying discount factors 被引量:2
11
作者 Xiao WU Xiaolong ZOU Xianping GUO 《Frontiers of Mathematics in China》 SCIE CSCD 2015年第4期1005-1023,共19页
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-d... This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 展开更多
关键词 Discrete-time Markov decision process (DTMDP) constrainedoptimality varying discount factor unbounded cost
原文传递
Convergence of Markov decision processes with constraints and state-action dependent discount factors 被引量:2
12
作者 Xiao Wu Xianping Guo 《Science China Mathematics》 SCIE CSCD 2020年第1期167-182,共16页
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analy... This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation. 展开更多
关键词 discrete-time Markov decision processes state-action dependent discount factors unbounded costs CONVERGENCE
原文传递
A review on Markov Decision Processes 被引量:4
13
作者 J. A. Filar and LIU Ke Centre for Industrial and Applicable Mathematics , University of South Australia , Australia Institute of Applied Mathematics, Chinese Academy of Sciences , Beijing 100080, China 《Chinese Science Bulletin》 SCIE EI CAS 1999年第7期672-672,共1页
MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy'... MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy' selected by a decision-maker/controller, with the goal of maximizing an overall reward function that is an appropriately defined aggregate of immediate rewards, over either finite or infinite time horizon.As such MDPs are a useful paradigm for modeling many processes occurring naturally in the management and engineering contexts.. 展开更多
关键词 A review on Markov decision processes
原文传递
Solving Markov Decision Processes with Downside Risk Adjustment 被引量:1
14
作者 Abhijit Gosavi Anish Parulekar 《International Journal of Automation and computing》 EI CSCD 2016年第3期235-245,共11页
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral cr... Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral criterion in which the expected rewards, either average or discounted, are maximized. There exists some literature on MDPs that takes risks into account. Much of this addresses the exponential utility (EU) function and mechanisms to penalize different forms of variance of the rewards. EU functions have some numerical deficiencies, while variance measures variability both above and below the mean rewards; the variability above mean rewards is usually beneficial and should not be penalized/avoided. As such, risk metrics that account for pre-specified targets (thresholds) for rewards have been considered in the literature, where the goal is to penalize the risks of revenues falling below those targets. Existing work on MDPs that takes targets into account seeks to minimize risks of this nature. Minimizing risks can lead to poor solutions where the risk is zero or near zero, but the average rewards are also rather low. In this paper, hence, we study a risk-averse criterion, in particular the so-called downside risk, which equals the probability of the revenues falling below a given target, where, in contrast to minimizing such risks, we only reduce this risk at the cost of slightly lowered average rewards. A solution where the risk is low and the average reward is quite high, although not at its maximum attainable value, is very attractive in practice. To be more specific, in our formulation, the objective function is the expected value of the rewards minus a scalar times the downside risk. In this setting, we analyze the infinite horizon MDP, the finite horizon MDP, and the infinite horizon semi-MDP (SMDP). We develop dynamic programming and reinforcement learning algorithms for the finite and infinite horizon. The algorithms are tested in numerical studies and show encouraging performance. 展开更多
关键词 Downside risk Markov decision processes reinforcement learning dynamic programming TARGETS thresholds.
原文传递
First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes 被引量:1
15
作者 Xin WEN Hai-feng HUO Xian-ping GUO 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2022年第3期549-567,共19页
This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expect... This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expected discounted and average criteria in the existing literature,we consider the risk probability that the total rewards produced by a system do not exceed a prescribed goal during a first passage time to some target set,and aim to find a policy that minimizes the risk probability over the class of all history-dependent policies.Under suitable conditions,we derive the optimality equation(OE)for the probability criterion,prove that the value function of the minimization problem is the unique solution to the OE,and establish the existence ofε(≥0)-optimal policies.Finally,we provide two examples to illustrate our results. 展开更多
关键词 piecewise deterministic Markov decision processes risk probability first passage time ε-optimal policy
原文传递
SINGULARLY PERTURBED MARKOV DECISION PROCESSES WITH INCLUSION OF TRANSIENT STATES 被引量:1
16
作者 R.H.Liu Q.Zhang G.Yin 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2001年第2期199-211,共13页
This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed int... This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed into several groups of recurrent states and a group of transient states resulting in a singularly perturbed MDP formulation. Instead of solving the original problem directly, a limit problem that is much simpler to handle is derived. On the basis of the optical solution of the limit problem, nearly optimal decisions are constructed for the original problem. The asymptotic optimality of the constructed control is obtained; the rate of convergence is ascertained. 展开更多
关键词 Markov decision process dynamic programming asymptotically optimal control.
原文传递
STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES 被引量:1
17
作者 ZHU Quanxin GUO Xianping 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2014年第5期1045-1063,共19页
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar... This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet. 展开更多
关键词 Continuous-time Markov decision process expected average reward criterion finite-horizon optimality Polish space strong n-discount optimality
原文传递
Average Sample-path Optimality for Continuous-time Markov Decision Processes in Polish Spaces
18
作者 Quan-xin ZHU 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2011年第4期613-624,共12页
In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criter... In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces. The corresponding transition rates are allowed to be unbounded, and the cost rates may have neither upper nor lower bounds. Under some mild hypotheses, we prove the existence of (ε〉 0)-ASPC optimal stationary policies based on two different approaches: one is the "optimality equation" approach and the other is the "two optimality inequalities" approach. 展开更多
关键词 continuous-time Markov decision process average sample-path optimality Polish space optimality equation optimality inequality
原文传递
TOTAL REWARD CRITERIA FOR UNCONSTRAINED/CONSTRAINED CONTINUOUS-TIME MARKOV DECISION PROCESSES
19
作者 Xianping GUO Lanlan ZHANG 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2011年第3期491-505,共15页
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable... This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems. 展开更多
关键词 Constrained-optimal policy continuous-time Markov decision process optimal policy total reward criterion unbounded reward/cost and transition rates.
原文传递
An average-value-at-risk criterion for Markov decision processes with unbounded costs
20
作者 Qiuli LIU Wai-Ki CHING +1 位作者 Junyu ZHANG Hongchu WANG 《Frontiers of Mathematics in China》 SCIE CSCD 2022年第4期673-687,共15页
We study the Markov decision processes under the average-value-at-risk criterion.The state space and the action space are Borel spaces,the costs are admitted to be unbounded from above,and the discount factors are sta... We study the Markov decision processes under the average-value-at-risk criterion.The state space and the action space are Borel spaces,the costs are admitted to be unbounded from above,and the discount factors are state-action dependent.Under suitable conditions,we establish the existence of optimal deterministic stationary policies.Furthermore,we apply our main results to a cash-balance model. 展开更多
关键词 Markov decision processes average-value-at-risk(AVaR) state-action dependent discount factors optimal policy
原文传递
上一页 1 2 6 下一页 到第
使用帮助 返回顶部