期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems 被引量:1
1
作者 Guangyu Zhu Xiaolu Li +2 位作者 Ranran Sun Yiyuan Yang Peng Zhang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第3期781-791,共11页
Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iterati... Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems,in this paper,a new iterative adaptive dynamic programming algorithm,which is the discrete-time time-varying policy iteration(DTTV)algorithm,is developed.The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance.The admissibility of the iterative control law is analyzed.The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution.To implement the algorithm,neural networks are employed and a new implementation structure is established,which avoids solving the generalized Bellman equation in each iteration.Finally,the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm,where the mass and pendulum bar length are permitted to be time-varying parameters.The effectiveness of the developed method is illustrated by numerical results and comparisons. 展开更多
关键词 Adaptive critic designs adaptive dynamic programming approximate dynamic programming optimal control policy iteration TIME-VARYING
下载PDF
Adaptive Optimal Discrete-Time Output-Feedback Using an Internal Model Principle and Adaptive Dynamic Programming 被引量:1
2
作者 Zhongyang Wang Youqing Wang Zdzisław Kowalczuk 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期131-140,共10页
In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed metho... In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming(ADP) technique based on the internal model principle(IMP). The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data. More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem. We then design an observer to reproduce the full state of the system by measuring the inputs and outputs. Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model. It is important that with this concept one does not need to solve the regulator equation. Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection. 展开更多
关键词 Adaptive dynamic programming(ADP) internal model principle(IMP) output feedback problem policy iteration(PI) value iteration(VI)
下载PDF
Adaptive Optimal Control of Space Tether System for Payload Capture via Policy Iteration 被引量:2
3
作者 FENG Yiting ZHANG Ming +1 位作者 GUO Wenhao WANG Changqing 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2021年第4期560-570,共11页
The libration control problem of space tether system(STS)for post-capture of payload is studied.The process of payload capture will cause tether swing and deviation from the nominal position,resulting in the failure o... The libration control problem of space tether system(STS)for post-capture of payload is studied.The process of payload capture will cause tether swing and deviation from the nominal position,resulting in the failure of capture mission.Due to unknown inertial parameters after capturing the payload,an adaptive optimal control based on policy iteration is developed to stabilize the uncertain dynamic system in the post-capture phase.By introducing integral reinforcement learning(IRL)scheme,the algebraic Riccati equation(ARE)can be online solved without known dynamics.To avoid computational burden from iteration equations,the online implementation of policy iteration algorithm is provided by the least-squares solution method.Finally,the effectiveness of the algorithm is validated by numerical simulations. 展开更多
关键词 space tether system(STS) payload capture policy iteration integral reinforcement learning(IRL) state feedback
下载PDF
Multiagent Reinforcement Learning:Rollout and Policy Iteration 被引量:1
4
作者 Dimitri Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第2期249-272,共24页
We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest... We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest method of this type,where just one improved policy is generated.We can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next iteration.In contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line replanning.Moreover,rollout can use as base policy one of the policies produced by PI,thereby improving on that policy.This is the type of scheme underlying the prominently successful Alpha Zero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate agent.This is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state information.Based on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other agents.The amount of total computation required at every stage grows linearly with the number of agents.By contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of agents.Despite the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base policy.We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the cost improvement property,without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems,we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation.For one of our PI algorithms,we prove convergence to an agentby-agent optimal policy,thus establishing a connection with the theory of teams.For another PI algorithm,which is executed over a more complex state space,we prove convergence to an optimal policy.Approximate forms of these algorithms are also given,based on the use of policy and value neural networks.These PI algorithms,in both their exact and their approximate form are strictly off-line methods,but they can be used to provide a base policy for use in an on-line multiagent rollout scheme. 展开更多
关键词 Dynamic programming multiagent problems neuro-dynamic programming policy iteration reinforcement learning rollout
下载PDF
A policy iteration method for improving robot assembly trajectory efficiency 被引量:1
5
作者 Qi ZHANG Zongwu XIE +1 位作者 Baoshi CAO Yang LIU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第3期436-448,共13页
Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bo... Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bolt.In this paper,a policy iteration method based on reinforcement learning(RL)is proposed,by which the problem of trajectory efficiency improvement is constructed as an issue of RL-based objective optimization.Firstly,the projection relation between raw data and state-action space is established,and then a policy iteration initialization method is designed based on the projection to provide the initialization policy for iteration.Policy iteration based on the protective policy is applied to continuously evaluating and optimizing the action-value function of all state-action pairs till the convergence is obtained.To verify the feasibility and effectiveness of the proposed method,a noncontact demonstration experiment with human supervision is performed.Experimental results show that the initialization policy and the generated policy can be obtained by the policy iteration method in a limited number of demonstrations.A comparison between the experiments with two different assembly tolerances shows that the convergent generated policy possesses higher trajectory efficiency than the conservative one.In addition,this method can ensure safety during the training process and improve utilization efficiency of demonstration data. 展开更多
关键词 Bolt assembly policy initialization policy iteration Reinforcement learning(RL) Robotic assembly Trajectory efficiency
原文传递
Performance sensitivities for parameterized Markov systems
6
作者 XirenCAO JunyuZHANG 《控制理论与应用(英文版)》 EI 2004年第1期65-68,共4页
It is known that the performance potentials (or equivalentiy, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systems, the changes in ... It is known that the performance potentials (or equivalentiy, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systems, the changes in parameters may only affect some states, and the explicit transition probability matrix may not be known. In this paper, we use an example to show that we can use potentials to construct performance sensitivities in a more flexible way; only the potentials at the affected states need to be estimated, and the transition probability matrix need not be known. Policy iteration algorithms, which are simpler than the standard one, can be established. 展开更多
关键词 Perturbation analysis Markov decision processes policy iteration Reinforcement learning Perturbation realization
下载PDF
A Lyapunov characterization of robust policy optimization
7
作者 Leilei Cui Zhong-Ping Jiang 《Control Theory and Technology》 EI CSCD 2023年第3期374-389,共16页
In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each... In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method. 展开更多
关键词 policy optimization policy iteration(PI)-Input-to-state stability(ISS) Lyapunov's direct method
原文传递
A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems 被引量:8
8
作者 WEI QingLai LIU DeRong 《Science China Chemistry》 SCIE EI CAS CSCD 2015年第12期143-157,共15页
In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic no... In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming(ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm. 展开更多
关键词 adaptive critic designs adaptive dynamic programming approximate dynamic programming Q-LEARNING policy iteration neural networks nonlinear systems optimal control
原文传递
Approximate policy iteration:a survey and somenew methods 被引量:6
9
作者 Dimitri P.BERTSEKAS 《控制理论与应用(英文版)》 EI 2011年第3期310-335,共26页
We consider the classical policy iteration method of dynamic programming(DP),where approximations and simulation are used to deal with the curse of dimensionality.We survey a number of issues:convergence and rate of c... We consider the classical policy iteration method of dynamic programming(DP),where approximations and simulation are used to deal with the curse of dimensionality.We survey a number of issues:convergence and rate of convergence of approximate policy evaluation methods,singularity and susceptibility to simulation noise of policy evaluation,exploration issues,constrained and enhanced policy iteration,policy oscillation and chattering,and optimistic and distributed policy iteration.Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches:projected equations and temporal differences(TD),and aggregation.In the context of these approaches,we survey two different types of simulation-based algorithms:matrix inversion methods,such as least-squares temporal difference(LSTD),and iterative methods,such as least-squares policy evaluation(LSPE) and TD(λ),and their scaled variants.We discuss a recent method,based on regression and regularization,which recti?es the unreliability of LSTD for nearly singular projected Bellman equations.An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE.Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees.We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation,but when done by aggregation it does not.This implies better error bounds and more regular performance for aggregation,at the expense of some loss of generality in cost function representation capability.Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation,and is characterized by favorable error bounds. 展开更多
关键词 Dynamic programming policy iteration Projected equation AGGREGATION CHATTERING REGULARIZATION
原文传递
Policy Iteration Algorithms for Zero-Sum Stochastic Differential Games with Long-Run Average Payoff Criteria
10
作者 JoséDaniel López-Barrientos 《Journal of the Operations Research Society of China》 EI 2014年第4期395-421,共27页
This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The sy... This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA. 展开更多
关键词 Ergodic payoff criterion Zero-sum stochastic differential games policy iteration algorithm Nondegenerate diffusions Poisson equation Schäl convergence Bias game
原文传递
Grap hical Minimax Game and Off-Policy Reinforcement Learning for Heterogeneous MASs with Spanning Tree Condition
11
作者 Wei Dong Jianan Wang +2 位作者 Chunyan Wang Zhenqiang Qi Zhengtao Ding 《Guidance, Navigation and Control》 2021年第3期1-23,共23页
In this paper,the optimal consensus control problem is investigated for heterogeneous linear multi-agent systems(MASs)with spanning tree condition based on game theory and rein-forcement learning.First,the graphical m... In this paper,the optimal consensus control problem is investigated for heterogeneous linear multi-agent systems(MASs)with spanning tree condition based on game theory and rein-forcement learning.First,the graphical minimax game algebraic Riccati equation(ARE)is derived by converting the consensus problem into a zero-sum game problem between each agent and its neighbors.The asymptotic stability and minimax validation of the closed-loop systems are proved theoretically.Then,a data-driven off-policy reinforcement learning algorithm is proposed to online learn the optimal control policy without the information of the system dynamics.A certain rank condition is established to guarantee the convergence of the proposed algorithm to the unique solution of the ARE.Finally,the e®ectiveness of the proposed method is demonstrated through a numerical simulation. 展开更多
关键词 Consensus control MASS minimax game reinforcement learning data-driven control policy iteration
原文传递
Discrete-time dynamic graphical games:model-free reinforcement learning solution 被引量:6
12
作者 Mohammed I.ABOUHEAF Frank L.LEWIS +1 位作者 Magdi S.MAHMOUD Dariusz G.MIKULSKI 《Control Theory and Technology》 EI CSCD 2015年第1期55-69,共15页
This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize t... This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from to make all the agents synchronize to the state of a command multi-agent dynamical systems, where pinning control is used generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents' dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time. 展开更多
关键词 Dynamic graphical games Nash equilibrium discrete mechanics optimal control model-free reinforcementlearning policy iteration
原文传递
Optimal Content Placement and Request Dispatching for Cloud-based Video Distribution Services 被引量:2
13
作者 Zheng-Huan Zhang Xiao-Feng Jiang Hong-Sheng Xi 《International Journal of Automation and computing》 EI CSCD 2016年第6期529-540,共12页
The rapid progress of cloud technology has attracted a growing number of video providers to consider deploying their streaming services onto cloud platform for more cost-effective, scalable and reliable performance. I... The rapid progress of cloud technology has attracted a growing number of video providers to consider deploying their streaming services onto cloud platform for more cost-effective, scalable and reliable performance. In this paper, we utilize Markov decision process model to formulate the dynamic deployment of cloud-based video services over multiple geographically distributed datacenters. We focus on maximizing the average profits for the video service provider over a long run and introduce an average performance criteria which reflects the cost and user experience jointly. We develop an optimal algorithm based on the sensitivity analysis and sample-based policy iteration to obtain the optimal video placement and request dispatching strategy. We demonstrate the optimality of our algorithm with theoretical proof and specify the practical feasibility of our algorithm. We conduct simulations to evaluate the performance of our algorithm and the results show that our strategy can effectively cut down the total cost and guarantee users' quality of experience (QoE). 展开更多
关键词 Cloud computing video distribution Markov decision process sensitivity analysis policy iteration.
原文传递
Controller Optimization for Multirate Systems Based on Reinforcement Learning 被引量:2
14
作者 Zhan Li Sheng-Ri Xue +1 位作者 Xing-Hu Yu Hui-Jun Gao 《International Journal of Automation and computing》 EI CSCD 2020年第3期417-427,共11页
The goal of this paper is to design a model-free optimal controller for the multirate system based on reinforcement learning.Sampled-data control systems are widely used in the industrial production process and multir... The goal of this paper is to design a model-free optimal controller for the multirate system based on reinforcement learning.Sampled-data control systems are widely used in the industrial production process and multirate sampling has attracted much attention in the study of the sampled-data control theory.In this paper,we assume the sampling periods for state variables are different from periods for system inputs.Under this condition,we can obtain an equivalent discrete-time system using the lifting technique.Then,we provide an algorithm to solve the linear quadratic regulator(LQR)control problem of multirate systems with the utilization of matrix substitutions.Based on a reinforcement learning method,we use online policy iteration and off-policy algorithms to optimize the controller for multirate systems.By using the least squares method,we convert the off-policy algorithm into a model-free reinforcement learning algorithm,which only requires the input and output data of the system.Finally,we use an example to illustrate the applicability and efficiency of the model-free algorithm above mentioned. 展开更多
关键词 Multirate system reinforcement learning policy iteration optimal control controller optimization
原文传递
Optimal synchronization control formulti-agent systems with input saturation:a nonzero-sum game 被引量:1
15
作者 Hongyang LI Qinglai WEI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第7期1010-1019,共10页
This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into ... This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into a multi-agent nonzero-sum game.Then,the Nash equilibrium can be achieved by solving the coupled Hamilton–Jacobi–Bellman(HJB)equations with nonquadratic input energy terms.A novel off-policy reinforcement learning method is presented to obtain the Nash equilibrium solution without the system models,and the critic neural networks(NNs)and actor NNs are introduced to implement the presented method.Theoretical analysis is provided,which shows that the iterative control laws converge to the Nash equilibrium.Simulation results show the good performance of the presented method. 展开更多
关键词 Optimal synchronization control Multi-agent systems Nonzero-sum game Adaptive dynamic programming Input saturation Off-policy reinforcement learning policy iteration
原文传递
Minimax Q-learning design for H_(∞) control of linear discrete-time systems
16
作者 Xinxing LI Lele XI +1 位作者 Wenzhong ZHA Zhihong PENG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第3期438-451,共14页
The H_(∞)control method is an effective approach for attenuating the effect of disturbances on practical systems, but it is difficult to obtain the H_(∞)controller due to the nonlinear Hamilton-Jacobi-Isaacs equatio... The H_(∞)control method is an effective approach for attenuating the effect of disturbances on practical systems, but it is difficult to obtain the H_(∞)controller due to the nonlinear Hamilton-Jacobi-Isaacs equation, even for linear systems. This study deals with the design of an H_(∞)controller for linear discrete-time systems. To solve the related game algebraic Riccati equation(GARE), a novel model-free minimax Q-learning method is developed, on the basis of an offline policy iteration algorithm, which is shown to be Newton’s method for solving the GARE. The proposed minimax Q-learning method, which employs off-policy reinforcement learning, learns the optimal control policies for the controller and the disturbance online, using only the state samples generated by the implemented behavior policies. Different from existing Q-learning methods, a novel gradient-based policy improvement scheme is proposed. We prove that the minimax Q-learning method converges to the saddle solution under initially admissible control policies and an appropriate positive learning rate, provided that certain persistence of excitation(PE)conditions are satisfied. In addition, the PE conditions can be easily met by choosing appropriate behavior policies containing certain excitation noises, without causing any excitation noise bias. In the simulation study, we apply the proposed minimax Q-learning method to design an H_(∞)load-frequency controller for an electrical power system generator that suffers from load disturbance, and the simulation results indicate that the obtained H_(∞)load-frequency controller has good disturbance rejection performance. 展开更多
关键词 H_(∞)control Zero-sum dynamic game Reinforcement learning Adaptive dynamic programming Minimax Q-learning policy iteration
原文传递
Multilevel Techniques for the Solution of HJB Minimum-Time Control Problems
17
作者 CIARAMELLA Gabriele FABRINI Giulia 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2021年第6期2069-2091,共23页
The solution of minimum-time feedback optimal control problems is generally achieved using the dynamic programming approach,in which the value function must be computed on numerical grids with a very large number of p... The solution of minimum-time feedback optimal control problems is generally achieved using the dynamic programming approach,in which the value function must be computed on numerical grids with a very large number of points.Classical numerical strategies,such as value iteration(VI)or policy iteration(PI)methods,become very inefficient if the number of grid points is large.This is a strong limitation to their use in real-world applications.To address this problem,the authors present a novel multilevel framework,where classical VI and PI are embedded in a full-approximation storage(FAS)scheme.In fact,the authors will show that VI and PI have excellent smoothing properties,a fact that makes them very suitable for use in multilevel frameworks.Moreover,a new smoother is developed by accelerating VI using Anderson’s extrapolation technique.The effectiveness of our new scheme is demonstrated by several numerical experiments. 展开更多
关键词 Anderson acceleration FAS Hamilton-Jacobi equation minimum-time problem multi-level acceleration methods policy iteration value iteration
原文传递
Markov decision processes associated with two threshold probability criteria
18
作者 Masahiko SAKAGUCHI Yoshio OHTSUBO 《控制理论与应用(英文版)》 EI CSCD 2013年第4期548-557,共10页
This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not gre... This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not greater than a given initial threshold value, and the second is a probability that the total reward is less than it. Our first (resp. second) optimizing problem is to minimize the first (resp. second) threshold probability. These problems suggest that the threshold value is a permissible level of the total reward to reach a goal (the target set), that is, we would reach this set over the level, if possible. For the both problems, we show that 1) the optimal threshold probability is a unique solution to an optimality equation, 2) there exists an optimal deterministic stationary policy, and 3) a value iteration and a policy space iteration are given. In addition, we prove that the first (resp. second) optimal threshold probability is a monotone increasing and right (resp. left) continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the first problem by using them in the second problem. 展开更多
关键词 Markov decision process Minimizing risk model Threshold probability policy space iteration
原文传递
Optimization of Markov jump linear system with controlled modes jump probabilities
19
作者 Yankai XU Xi CHEN 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2009年第1期55-59,共5页
The optimal control of a Markov jump linear quadratic model with controlled jump probabilities of modes is investigated.Two kinds of mode control policies,i.e.,open-loop control policy and closed-loop control policy,a... The optimal control of a Markov jump linear quadratic model with controlled jump probabilities of modes is investigated.Two kinds of mode control policies,i.e.,open-loop control policy and closed-loop control policy,are considered.Using the concepts of policy iteration and performance potential,the sufficient condition needed for the optimal closed-loop control policy to perform better than the optimal open-loop control policy is proposed.The condition is helpful for the design of an optimal controller.Furthermore,an efficient algorithm to construct a closed-loop control policy,which is better than the optimal open-loop control policy,is given with policy iteration. 展开更多
关键词 Markov jump system optimal control policy iteration
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部