逻辑半马尔可夫决策过程及Q学习

Logical semi-Markov Decision Processes and Q-learning

下载PDF

导出

摘要增强学习已经开始向关系增强学习发展,并且产生了许多新的算法。这些方法是将命题表达提升为关系或计算逻辑的表达。提出了一种新的表达形式,称为逻辑半马尔可夫决策过程。它是将逻辑程序与半马尔可夫过程相结合。在此框架中,抽象(状态或行动)是至关重要的,并且提出了对于逻辑半马尔可夫决策过程的Q-学习算法,给出其收敛证明。这种框架对在关系增强学习发展中处理时间连续方面提供了一个合理的基础。 Reinforcement learning has been developed towards relational reinforcement learning and a large number of new algorithms are provided. Most of them are upgrades of propositional representations towards the use of relational or computational logic representations. A novel representation formalism called logical semi-Markov decision process is presented, which inte- grates semi-Markov decision processes with logic programs. Within this framework, abstract- ness （state or action） is fundamental. Then an algorithm of Q-learning for the logical semi- Markov decision process is given and its convergent nature is proved. This framework will pro- vide a sound basis for further development of relational reinforcement learning in dealing with continuous time domain.

作者王蓁蓁王智钢

机构地区金陵科技学院信息技术学院江苏省信息分析工程实验室

出处《金陵科技学院学报》 2013年第2期13-19,共7页 Journal of Jinling Institute of Technology

基金金陵科技学院科研基金资助项目(No.jit-b-201207)

关键词关系增强学习半马尔可夫逻辑半马尔可夫决策过程 relational reinforcement learning semi-Markov logical semi-Markov decision process

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献13

1Richard S S, Doina P,Satinder S. Between MDPs and Semi-MDPs. A Framework for Temporal Abstraction in Rein- forcement Learning [J]. Artificial Intelligence, 1999,112 : 181-211.
2Kristian K,Martijn Van O,Luc De R. Bellman Goes Relational. In Proceedings of the 21^st International Conference on Machine Learning[C]. Banff, Canada : 2004,59.
3S Joshi,K Kersting,R Khardon. Generalized First Order Decision Diagrams for First Order Markov Decision. In Pro- ceedings of the 21" International Joint Conference on Artificial Intelligence (IJCAI' 09)[C]. San Francisco,CA,USA: 2009,1916-1921.
4Kristian K,Luc De R. Logical Markov Decision Programs. In Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data(SRL-03) [C]. [Plece unknown] : 2003,63-70.
5Craig B,Ray R,Bob P. Symbolic Dynamic Programming for First-order MDPs. In Seventeenth International Joint Con- ference on Artificial Intelligence (IJCAI-01) [C]. Seattle, USA : 2001,690-700.
6Kristian K ,Luc De R. Logical Markov Decision Programs and the Convergence of Logical TD(2). In Proceeding of The 14^th International Conference of Inductive Logic Programming[C]. Porto,Portugal:2004,180-197.
7Alan F,Sungwook Y,Robert G. Approximate Policy Iteration with a Policy Language Bias:Solving Relational Markov Decision Processes [J]. Journal of Artificial Intelligence Research, 2006,25 : 75-118.
8Steven J B,Michael O D. Reinforcement Learning Methods for Continuous-time Markov Decision Problems. In Ad- vances in Neural Information Processing Systems 7[C]. Cambridge,MA,MIT Press: 1994,393-400.
9Balarman R,Andrew G. B. Symmetries and Model Minimization in Markov Decision Processes [R]. Amherst:Universi- ty of Massachusetts, 2001.
10Prasad T,Robert G,Kurt D. Relational Reinforcement Learning:An Overview. In Proceedings of the ICML' 04 Work- shop on Relational Reinforcement Learning[C]. Banff,Canada : 2004,1-10.

二级参考文献2

1[1]STEVEN B.DAVIS,MEMBER,IEEE,AND PAUL MERMELSTEIN.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences[J].IEEE trans on ASSP,1980,28 (4):357-366.
2[2]G.M.White and R.B.Neely.Speech Recognition Experiments with Linear Prediction,Bandpass Filtering,and Dynamic Programming[J].IEEE trans on ASSP,1976(24):183-188.

共引文献2

1张丽,王福忠,张涛.混合抗噪语音识别模型的设计与仿真[J].河南理工大学学报（自然科学版）,2007,26(6):694-699. 被引量：3
2王帅,郑文举.关于汉语普通话元音声学特点的研究[J].辽宁警专学报,2010,12(6):52-54.

1李帅,王先培,王泉德,牛胜巍.基于SMDP强化学习的电力信息网络入侵检测研究[J].电力自动化设备,2006,26(12):75-78. 被引量：4
2柴雪霞,马学森,周雷,唐昊.基于SMDP模型的Web服务组合优化方法[J].合肥工业大学学报（自然科学版）,2011,34(10):1496-1500. 被引量：4
3姚程宽,光峰,卢灿举,曹立勇,詹喆.数据挖掘经典算法研究[J].广州航海学院学报,2016,24(3):47-49.
4龙复兴.结构振动△t最优采样控制[J].哈尔滨建筑工程学院学报,1992,25(1):17-25. 被引量：1
5光峰,姚程宽,卢灿举,曹立勇,詹喆.数据挖掘经典算法研究[J].商丘师范学院学报,2016,32(3):44-47. 被引量：3
6薛恒威.基于4G移动学习的研究及系统设计[J].电子技术与软件工程,2015(3):19-19. 被引量：2
7创新工场：90％简单人类工作10年内将被取代[J].家庭服务,2017(1):10-10.
8苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量：9
9程燕,唐昊,马学森.基于策略迭代和遗传算法的SMDP鲁棒控制策略求解[J].合肥工业大学学报（自然科学版）,2007,30(11):1404-1407. 被引量：1
10彭志平,李绍平.分层强化学习研究进展[J].计算机应用研究,2008,25(4):974-978. 被引量：7

金陵科技学院学报

2013年第2期

浏览历史

内容加载中请稍等...

逻辑半马尔可夫决策过程及Q学习

参考文献13

二级参考文献2

共引文献2

相关作者

相关机构

相关主题

浏览历史