期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
一种基于最优策略概率分布的POMDP值迭代算法 被引量:4
1
作者 刘峰 王崇骏 骆斌 《电子学报》 EI CAS CSCD 北大核心 2016年第5期1078-1084,共7页
随着应用中POMDP问题的规模不断扩大,基于最优策略可达区域的启发式方法成为了目前的研究热点.然而目前已有的算法虽然保证了全局最优,但选择最优动作还不够精确,影响了算法的效率.本文提出一种基于最优策略概率的值迭代方法 PBVIOP.该... 随着应用中POMDP问题的规模不断扩大,基于最优策略可达区域的启发式方法成为了目前的研究热点.然而目前已有的算法虽然保证了全局最优,但选择最优动作还不够精确,影响了算法的效率.本文提出一种基于最优策略概率的值迭代方法 PBVIOP.该方法在深度优先的启发式探索中,根据各个动作值函数在其上界和下界之间的分布,用蒙特卡罗法计算动作最优的概率,选择概率最大的动作作为最优探索策略.在4个基准问题上的实验结果表明PBVIOP算法能够收敛到全局最优解,并明显提高了收敛效率. 展开更多
关键词 部分可观测马尔科夫决策过程 基于最优策略概率的值迭代算法 蒙特卡罗法
下载PDF
Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems 被引量:3
2
作者 Bo PANG Tao BIAN Zhong-Ping JIANG 《Control Theory and Technology》 EI CSCD 2019年第1期73-84,共12页
This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-vary... This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discretetime systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing in finite-horizon PI methods are discussed. Then, both data-drive n off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control. 展开更多
关键词 optimal control TIME-VARYING system adaptive dynamic PROGRAMMING policy iteration (PI) value iteration(VI)
原文传递
Optimal stopping time on discounted semi-Markov processes
3
作者 Fang CHEN Xianping GUO Zhong-Wei LIAO 《Frontiers of Mathematics in China》 SCIE CSCD 2021年第2期303-324,共22页
This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the eq... This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end. 展开更多
关键词 optimal stopping time semi-Markov processes(SMPs) value function semi-Markov decision processes(SMDPs) optimal policy iterative lgorithm
原文传递
基于半马氏的无限阶段指数效用最优模型
4
作者 温鲜 霍海峰 《应用概率统计》 CSCD 北大核心 2023年第4期577-588,共12页
本文考虑半马氏决策过程的指数效用最优问题,其中状态和行动空间均为Borel集,报酬函数非负.最优准则是最大化系统无限阶段内获取总报酬指数效用的期望值.首先,建立标准正则性条件确保状态过程非爆炸,连续-紧条件确保最优策略存在.其次,... 本文考虑半马氏决策过程的指数效用最优问题,其中状态和行动空间均为Borel集,报酬函数非负.最优准则是最大化系统无限阶段内获取总报酬指数效用的期望值.首先,建立标准正则性条件确保状态过程非爆炸,连续-紧条件确保最优策略存在.其次,基于这些条件,利用值迭代和嵌入链技术,证明了值函数是相应最优方程的唯一解以及最优策略的存在性.最后,通过实例展示了如何利用值迭代算法计算值函数和最优策略. 展开更多
关键词 半马氏决策过程 指数效用 值迭代 最优方程 最优策略
下载PDF
马尔可夫过程在物价渡动研究中的应用——策略迭代在考虑钱币损失的经济系统中的实现
5
作者 马文 《贵州师范大学学报(自然科学版)》 CAS 1993年第1期24-32,共9页
本文利用马尔可夫过程理论研究了某货物价格变动的有关規律。这是系列研究的第三部份。
关键词 马尔可夫过程 转移概率矩阵 物价
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部