基于性能势的Markov控制过程双时间尺度仿真算法

Two-Timescale Simulation-based Algorithm for Markov Decision Process Based on Performance Potentials

下载PDF

导出

摘要在基于性能势的随机逼近方法中引入双时间尺度的概念,提出了离散时间Markov控制过程的基于性能势的双时间尺度仿真梯度算法,弥补了传统算法中每步更新算法更新频率过快和更新环更新算法更新频率过慢的不足,并利用三个数值例子来说明双时间尺度更新算法在计算复杂度、收敛速度和收敛精度上的优势。 A novel two time-scale simulation-based gradient algorithm based on performance potential for discrete time Markov decision process was proposed, by introducing the concept of two time-scale into the performance potential based stochastic approximation. This algorithm tackles the limitations in classical approaches that the every-update simulation- based gradient algorithm updates too frequently, and the regenerative-update gradient algorithm updates too infrequently. Three numerical examples illustrate the superiority of two time-scale simulation-based gradient algorithm in computational complexity, convergence speed and convergence precision.

作者鲍秉坤殷保群奚宏生

机构地区中国科学技术大学自动化系

出处《系统仿真学报》 CAS CSCD 北大核心 2009年第13期4114-4119,共6页 Journal of System Simulation

基金国家自然科学基金(60574065 60774038)

关键词 MARKOV控制过程性能势双时间尺度随机逼近 Markov decision process performance potential two time-scale stochastic approximation

分类号 TP391.9 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1M L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming [M]. New York, USA: Wiley, 1994.
2X R Cao. Stochastic Learning and Optimization: Sensitivity-Based Approach [M]. Germany: Springer, 2007.
3E K P Chong, P J Ramadage. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis [J]. IEEE Trans. on Automatic Control (S0018-9286), 1994, 39(7): 1400-1410.
4Aberdeen Douglas Alexander. Policy-gradient algorithms for partially observable Markov decision process [D]. Australia: Australian National University, 2003.
5代桂平,殷保群,王肖龙,奚宏生.受控M/G/1排队系统的性能优化及迭代算法[J].系统仿真学报,2004,16(8):1683-1685. 被引量：3
6P Marbach, J N Tsitsiklis. Simulation-based optimization of Markov reward processes [J]. IEEE Trans. Automatic Control (S0018-9286), 2001, 46(2): 191-209.
7X R Cao. A Basic Formula for Online Policy Gradient Algorithms [J]. IEEE Trans. on Automatic Control (S0018-9286), 2005, 50(5): 696-699.
8J Baxter, P L Barlett. Infinite-horizon policy gradient estimation [J]. Journal of Artificial Intelligence Research (S1076-9757), 2001, 15: 319-350.
9Y J Li, B Q Yin, H S Xi. The policy gradient estimation of continuous-time Hidden Markov Decision Process [C]//Proc. IEEE International Conference on Information Acquisition, 2005.
10V S Borkar. Stochastic approximation with two time scales [J]. Systems and Control Letters (SO 167-6911 ), 1997, 29(5): 291-294.

二级参考文献7

1Cao X R.A unified approach to Markov decision problems and performance sensitivity analysis [J].Automatic,2000,36(5):771-774.
2周亚平,殷保群,奚宏生,谭小彬,孙德敏.一类闭排队网络基于性能势的优化算法[J].中国科学技术大学学报,2000,30(2):151-157. 被引量：8
3殷保群,奚宏生,周亚平.M/G/1排队系统的性能灵敏度分析[J].高校应用数学学报（A辑）,2001,1(2):235-242. 被引量：3
4周亚平,奚宏生,殷保群,孙德敏.一类受控闭排队网络基于性能势的最优性方程[J].控制理论与应用,2002,19(4):521-526. 被引量：3
5周亚平,奚宏生,殷保群,孙德敏.Markov控制过程基于性能势的平均代价最优策略[J].自动化学报,2002,28(6):904-910. 被引量：4
6殷保群,代桂平,周亚平,谭小彬,奚宏生.闭排队网络基于并行仿真的灵敏度估计和优化算法[J].控制与决策,2003,18(3):348-350. 被引量：2
7代桂平,殷保群,周亚平,奚宏生.M/G/1排队系统的性能灵敏度估计与仿真[J].系统仿真学报,2003,15(7):950-952. 被引量：5

共引文献2

1李豹,程文娟,周雷,唐昊.Rollout及其并行求解算法在多类商品库存控制中的应用[J].系统仿真学报,2007,19(17):3883-3887. 被引量：1
2罗伟,刘全,胡志根.基于工序排队的RCC坝施工Petri网耦合模型仿真[J].系统仿真学报,2009,21(19):6280-6283. 被引量：4

1苗萌萌,许鋆,罗雄麟.基于状态空间模型的双时间尺度预测控制算法[J].计算机与应用化学,2016,33(10):1108-1114.
2师占群,商同.神经网络随机逼近学习算法中随机数选法浅析[J].河北工业大学学报（社会科学版）,1997(4):20-24.
3叶喜勇,陶霖密,萨如汗.基于双时间尺度运动分析的户外小目标跟踪[J].电子学报,2015,43(7):1257-1265. 被引量：3
4江琦,路改香,唐昊,谭琦.智能电网弹性响应时间业务需求的接入控制[J].控制与决策,2014,29(7):1311-1315. 被引量：7
5沈栋,陈翰馥.一类非线性系统的迭代学习控制[J].系统科学与数学,2008,28(9):1053-1064. 被引量：2
6张玉清,蔡安妮,孙景鳌.分组网上多媒体实时业务双时间尺度速率控制算法[J].计算机科学,2003,30(12):34-37.
7徐骞,朱明.基于双时间尺度QoS分层的拥塞控制算法[J].计算机仿真,2006,23(4):106-109.
8朱玉,王先来.对未建模型的工业过程对象进行控制的新方法[J].自动化与仪器仪表,2002(3):35-39.
9赵明旺,吕勇哉.随机时变参数的随机逼近辨识及收敛性分析[J].控制理论与应用,1992,9(3):256-261. 被引量：2
10周亚平,奚宏生,殷保群,孙德敏.Markov控制过程基于性能势的平均代价最优策略[J].自动化学报,2002,28(6):904-910. 被引量：4

系统仿真学报

2009年第13期

浏览历史

内容加载中请稍等...

基于性能势的Markov控制过程双时间尺度仿真算法

参考文献14

二级参考文献7

共引文献2

相关作者

相关机构

相关主题

浏览历史