期刊文献+
共找到68篇文章
< 1 2 4 >
每页显示 20 50 100
Variance minimization for continuous-time Markov decision processes: two approaches 被引量:1
1
作者 ZHU Quan-xin 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2010年第4期400-410,共11页
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi... This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 展开更多
关键词 Continuous-time markov decision process Polish space variance minimization optimality equation optimality inequality.
下载PDF
Robust analysis of discounted Markov decision processes with uncertain transition probabilities 被引量:1
2
作者 LOU Zhen-kai HOU Fu-jun LOU Xu-ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2020年第4期417-436,共20页
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob... Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods. 展开更多
关键词 markov decision processes uncertain transition probabilities robustness and sensitivity robust optimal policy value interval
下载PDF
Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach
3
作者 Qianrong Shen 《Journal of Computer and Communications》 2021年第12期80-97,共18页
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ... In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%. 展开更多
关键词 Ride-on-Demand Service markov decision process Dynamic Pricing Taxi Services Route Recommendation
下载PDF
A dynamical neural network approach for distributionally robust chance-constrained Markov decision process
4
作者 Tian Xia Jia Liu Zhiping Chen 《Science China Mathematics》 SCIE CSCD 2024年第6期1395-1418,共24页
In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms und... In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach. 展开更多
关键词 markov decision process chance constraints distributionally robust optimization moment-based ambiguity set dynamical neural network
原文传递
Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model 被引量:5
5
作者 Jianli Xie Wenjuan Gao Cuiran Li 《China Communications》 SCIE CSCD 2020年第2期40-53,共14页
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri... A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs. 展开更多
关键词 heterogeneous wireless networks markov decision process reward function genetic algorithm simulated annealing
下载PDF
Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles
6
作者 Xiaoqi Qiu Peng Lai +1 位作者 Changsheng Gao Wuxing Jing 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期457-470,共14页
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u... This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws. 展开更多
关键词 Endoatmospheric interception Missile guidance Reinforcement learning markov decision process Recurrent neural networks
下载PDF
Performance sensitivities for parameterized Markov systems
7
作者 XirenCAO JunyuZHANG 《控制理论与应用(英文版)》 EI 2004年第1期65-68,共4页
It is known that the performance potentials (or equivalentiy, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systems, the changes in ... It is known that the performance potentials (or equivalentiy, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systems, the changes in parameters may only affect some states, and the explicit transition probability matrix may not be known. In this paper, we use an example to show that we can use potentials to construct performance sensitivities in a more flexible way; only the potentials at the affected states need to be estimated, and the transition probability matrix need not be known. Policy iteration algorithms, which are simpler than the standard one, can be established. 展开更多
关键词 Perturbation analysis markov decision processes Policy iteration Reinforcement learning Perturbation realization
下载PDF
Grid Integration of Wind Generation Considering Remote Wind Farms:Hybrid Markovian and Interval Unit Commitment
8
作者 Bing Yan Haipei Fan +5 位作者 Peter B.Luh Khosrow Moslehi Xiaoming Feng Chien Ning Yu Mikhail A.Bragin Yaowen Yu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第2期205-215,共11页
Grid integration of wind power is essential to reduce fossil fuel usage but challenging in view of the intermittent nature of wind.Recently,we developed a hybrid Markovian and interval approach for the unit commitment... Grid integration of wind power is essential to reduce fossil fuel usage but challenging in view of the intermittent nature of wind.Recently,we developed a hybrid Markovian and interval approach for the unit commitment and economic dispatch problem where power generation of conventional units is linked to local wind states to dampen the effects of wind uncertainties.Also,to reduce complexity,extreme and expected states are considered as interval modeling.Although this approach is effective,the fact that major wind farms are often located in remote locations and not accompanied by conventional units leads to conservative results.Furthermore,weights of extreme and expected states in the objective function are difficult to tune,resulting in significant differences between optimization and simulation costs.In this paper,each remote wind farm is paired with a conventional unit to dampen the effects of wind uncertainties without using expensive utility-scaled battery storage,and extra constraints are innovatively established to model pairing.Additionally,proper weights are derived through a novel quadratic fit of cost functions.The problem is solved by using a creative integration of our recent surrogate Lagrangian relaxation and branch-and-cut.Results demonstrate modeling accuracy,computational efficiency,and significant reduction of conservativeness of the previous approach. 展开更多
关键词 BRANCH-AND-CUT interval optimization markov decision process remote wind farms surrogate Lagrangian relaxation(SLR) unit commitment
下载PDF
Opportunistic admission and resource allocation for slicing enhanced IoT networks
9
作者 Long Zhang Bin Cao Gang Feng 《Digital Communications and Networks》 SCIE CSCD 2023年第6期1465-1476,共12页
Network slicing is envisioned as one of the key techniques to meet the extremely diversified service requirements of the Internet of Things(IoT)as it provides an enhanced user experience and elastic resource configura... Network slicing is envisioned as one of the key techniques to meet the extremely diversified service requirements of the Internet of Things(IoT)as it provides an enhanced user experience and elastic resource configuration.In the context of slicing enhanced IoT networks,both the Service Provider(SP)and Infrastructure Provider(InP)face challenges of ensuring efficient slice construction and high profit in dynamic environments.These challenges arise from randomly generated and departed slice requests from end-users,uncertain resource availability,and multidimensional resource allocation.Admission and resource allocation for distinct demands of slice requests are the key issues in addressing these challenges and should be handled effectively in dynamic environments.To this end,we propose an Opportunistic Admission and Resource allocation(OAR)policy to deal with the issues of random slicing requests,uncertain resource availability,and heterogeneous multi-resources.The key idea of OAR is to allow the SP to decide whether to accept slice requests immediately or defer them according to the load and price of resources.To cope with the random slice requests and uncertain resource availability,we formulated this issue as a Markov Decision Process(MDP)to obtain the optimal admission policy,with the aim of maximizing the system reward.Furthermore,the buyer-seller game theory approach was adopted to realize the optimal resource allocation,while motivating each SP and InP to maximize their rewards.Our numerical results show that the proposed OAR policy can make reasonable decisions effectively and steadily,and outperforms the baseline schemes in terms of the system reward. 展开更多
关键词 SLICE IOT markov decision process Game theory Admission and resource allocation
下载PDF
SBFT:A BFT Consensus Mechanism Based on DQN Algorithm for Industrial Internet of Thing
10
作者 Ningjie Gao Ru Huo +3 位作者 Shuo Wang Jiang Liu Tao Huang Yunjie Liu 《China Communications》 SCIE CSCD 2023年第10期185-199,共15页
With the development and widespread use of blockchain in recent years,many projects have introduced blockchain technology to solve the growing security issues of the Industrial Internet of Things(IIoT).However,due to ... With the development and widespread use of blockchain in recent years,many projects have introduced blockchain technology to solve the growing security issues of the Industrial Internet of Things(IIoT).However,due to the conflict between the operational performance and security of the blockchain system and the compatibility issues with a large number of IIoT devices running together,the mainstream blockchain system cannot be applied to IIoT scenarios.In order to solve these problems,this paper proposes SBFT(Speculative Byzantine Consensus Protocol),a flexible and scalable blockchain consensus mechanism for the Industrial Internet of Things.SBFT has a consensus process based on speculation,improving the throughput and consensus speed of blockchain systems and reducing communication overhead.In order to improve the compatibility and scalability of the blockchain system,we select some nodes to participate in the consensus,and these nodes have better performance in the network.Since multiple properties determine node performance,we abstract the node selection problem as a joint optimization problem and use Dueling Deep Q Learning(DQL)to solve it.Finally,we evaluate the performance of the scheme through simulation,and the simulation results prove the superiority of our scheme. 展开更多
关键词 Industrial Internet of Things Byzantine fault tolerance speculative consensus mechanism markov decision process deep reinforcement learning
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
11
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
Optimal Policies for Quantum Markov Decision Processes 被引量:2
12
作者 Ming-Sheng Ying Yuan Feng Sheng-Gang Ying 《International Journal of Automation and computing》 EI CSCD 2021年第3期410-421,共12页
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro... Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world. 展开更多
关键词 Quantum markov decision processes quantum machine learning reinforcement learning dynamic programming decision making
原文传递
Convergence of Markov decision processes with constraints and state-action dependent discount factors 被引量:2
13
作者 Xiao Wu Xianping Guo 《Science China Mathematics》 SCIE CSCD 2020年第1期167-182,共16页
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analy... This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs)with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the"limit"one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation. 展开更多
关键词 discrete-time markov decision processes state-action dependent discount factors unbounded costs CONVERGENCE
原文传递
First passage Markov decision processes with constraints and varying discount factors 被引量:2
14
作者 Xiao WU Xiaolong ZOU Xianping GUO 《Frontiers of Mathematics in China》 SCIE CSCD 2015年第4期1005-1023,共19页
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-d... This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 展开更多
关键词 Discrete-time markov decision process (DTMDP) constrainedoptimality varying discount factor unbounded cost
原文传递
A review on Markov Decision Processes 被引量:4
15
作者 J. A. Filar and LIU Ke Centre for Industrial and Applicable Mathematics , University of South Australia , Australia Institute of Applied Mathematics, Chinese Academy of Sciences , Beijing 100080, China 《Chinese Science Bulletin》 SCIE EI CAS 1999年第7期672-672,共1页
MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy'... MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy' selected by a decision-maker/controller, with the goal of maximizing an overall reward function that is an appropriately defined aggregate of immediate rewards, over either finite or infinite time horizon.As such MDPs are a useful paradigm for modeling many processes occurring naturally in the management and engineering contexts.. 展开更多
关键词 A review on markov decision processes
原文传递
Solving Markov Decision Processes with Downside Risk Adjustment 被引量:1
16
作者 Abhijit Gosavi Anish Parulekar 《International Journal of Automation and computing》 EI CSCD 2016年第3期235-245,共11页
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral cr... Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral criterion in which the expected rewards, either average or discounted, are maximized. There exists some literature on MDPs that takes risks into account. Much of this addresses the exponential utility (EU) function and mechanisms to penalize different forms of variance of the rewards. EU functions have some numerical deficiencies, while variance measures variability both above and below the mean rewards; the variability above mean rewards is usually beneficial and should not be penalized/avoided. As such, risk metrics that account for pre-specified targets (thresholds) for rewards have been considered in the literature, where the goal is to penalize the risks of revenues falling below those targets. Existing work on MDPs that takes targets into account seeks to minimize risks of this nature. Minimizing risks can lead to poor solutions where the risk is zero or near zero, but the average rewards are also rather low. In this paper, hence, we study a risk-averse criterion, in particular the so-called downside risk, which equals the probability of the revenues falling below a given target, where, in contrast to minimizing such risks, we only reduce this risk at the cost of slightly lowered average rewards. A solution where the risk is low and the average reward is quite high, although not at its maximum attainable value, is very attractive in practice. To be more specific, in our formulation, the objective function is the expected value of the rewards minus a scalar times the downside risk. In this setting, we analyze the infinite horizon MDP, the finite horizon MDP, and the infinite horizon semi-MDP (SMDP). We develop dynamic programming and reinforcement learning algorithms for the finite and infinite horizon. The algorithms are tested in numerical studies and show encouraging performance. 展开更多
关键词 Downside risk markov decision processes reinforcement learning dynamic programming TARGETS thresholds.
原文传递
SINGULARLY PERTURBED MARKOV DECISION PROCESSES WITH INCLUSION OF TRANSIENT STATES 被引量:1
17
作者 R.H.Liu Q.Zhang G.Yin 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2001年第2期199-211,共13页
This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed int... This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed into several groups of recurrent states and a group of transient states resulting in a singularly perturbed MDP formulation. Instead of solving the original problem directly, a limit problem that is much simpler to handle is derived. On the basis of the optical solution of the limit problem, nearly optimal decisions are constructed for the original problem. The asymptotic optimality of the constructed control is obtained; the rate of convergence is ascertained. 展开更多
关键词 markov decision process dynamic programming asymptotically optimal control.
原文传递
First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes 被引量:1
18
作者 Xin WEN Hai-feng HUO Xian-ping GUO 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2022年第3期549-567,共19页
This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expect... This paper is an attempt to study the minimization problem of the risk probability of piecewise deterministic Markov decision processes(PDMDPs)with unbounded transition rates and Borel spaces.Different from the expected discounted and average criteria in the existing literature,we consider the risk probability that the total rewards produced by a system do not exceed a prescribed goal during a first passage time to some target set,and aim to find a policy that minimizes the risk probability over the class of all history-dependent policies.Under suitable conditions,we derive the optimality equation(OE)for the probability criterion,prove that the value function of the minimization problem is the unique solution to the OE,and establish the existence ofε(≥0)-optimal policies.Finally,we provide two examples to illustrate our results. 展开更多
关键词 piecewise deterministic markov decision processes risk probability first passage time ε-optimal policy
原文传递
A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation
19
作者 James W. Mock Suresh S. Muknahallipatna 《Journal of Intelligent Learning Systems and Applications》 2023年第1期36-56,共21页
Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Poli... Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm. 展开更多
关键词 Reinforcement Learning Machine Learning markov decision process Domain Randomization
下载PDF
Analysis of a POMDP Model for an Optimal Maintenance Problem with Multiple Imperfect Repairs
20
作者 Nobuyuki Tamura 《American Journal of Operations Research》 2023年第6期133-146,共14页
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac... I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem. 展开更多
关键词 Partially Observable markov decision process Imperfect Repair Stochastic Order Monotone Property Optimal Maintenance Policy
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部