期刊文献+

逻辑半马尔可夫决策过程及Q学习

Logical semi-Markov Decision Processes and Q-learning
下载PDF
导出
摘要 增强学习已经开始向关系增强学习发展,并且产生了许多新的算法。这些方法是将命题表达提升为关系或计算逻辑的表达。提出了一种新的表达形式,称为逻辑半马尔可夫决策过程。它是将逻辑程序与半马尔可夫过程相结合。在此框架中,抽象(状态或行动)是至关重要的,并且提出了对于逻辑半马尔可夫决策过程的Q-学习算法,给出其收敛证明。这种框架对在关系增强学习发展中处理时间连续方面提供了一个合理的基础。 Reinforcement learning has been developed towards relational reinforcement learning and a large number of new algorithms are provided. Most of them are upgrades of propositional representations towards the use of relational or computational logic representations. A novel representation formalism called logical semi-Markov decision process is presented, which inte- grates semi-Markov decision processes with logic programs. Within this framework, abstract- ness (state or action) is fundamental. Then an algorithm of Q-learning for the logical semi- Markov decision process is given and its convergent nature is proved. This framework will pro- vide a sound basis for further development of relational reinforcement learning in dealing with continuous time domain.
出处 《金陵科技学院学报》 2013年第2期13-19,共7页 Journal of Jinling Institute of Technology
基金 金陵科技学院科研基金资助项目(No.jit-b-201207)
关键词 关系增强学习 半马尔可夫 逻辑半马尔可夫 决策过程 relational reinforcement learning semi-Markov logical semi-Markov decision process
  • 相关文献

参考文献13

  • 1Richard S S, Doina P,Satinder S. Between MDPs and Semi-MDPs. A Framework for Temporal Abstraction in Rein- forcement Learning [J]. Artificial Intelligence, 1999,112 : 181-211.
  • 2Kristian K,Martijn Van O,Luc De R. Bellman Goes Relational. In Proceedings of the 21^st International Conference on Machine Learning[C]. Banff, Canada : 2004,59.
  • 3S Joshi,K Kersting,R Khardon. Generalized First Order Decision Diagrams for First Order Markov Decision. In Pro- ceedings of the 21" International Joint Conference on Artificial Intelligence (IJCAI' 09)[C]. San Francisco,CA,USA: 2009,1916-1921.
  • 4Kristian K,Luc De R. Logical Markov Decision Programs. In Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data(SRL-03) [C]. [Plece unknown] : 2003,63-70.
  • 5Craig B,Ray R,Bob P. Symbolic Dynamic Programming for First-order MDPs. In Seventeenth International Joint Con- ference on Artificial Intelligence (IJCAI-01) [C]. Seattle, USA : 2001,690-700.
  • 6Kristian K ,Luc De R. Logical Markov Decision Programs and the Convergence of Logical TD(2). In Proceeding of The 14^th International Conference of Inductive Logic Programming[C]. Porto,Portugal:2004,180-197.
  • 7Alan F,Sungwook Y,Robert G. Approximate Policy Iteration with a Policy Language Bias:Solving Relational Markov Decision Processes [J]. Journal of Artificial Intelligence Research, 2006,25 : 75-118.
  • 8Steven J B,Michael O D. Reinforcement Learning Methods for Continuous-time Markov Decision Problems. In Ad- vances in Neural Information Processing Systems 7[C]. Cambridge,MA,MIT Press: 1994,393-400.
  • 9Balarman R,Andrew G. B. Symmetries and Model Minimization in Markov Decision Processes [R]. Amherst:Universi- ty of Massachusetts, 2001.
  • 10Prasad T,Robert G,Kurt D. Relational Reinforcement Learning:An Overview. In Proceedings of the ICML' 04 Work- shop on Relational Reinforcement Learning[C]. Banff,Canada : 2004,1-10.

二级参考文献2

  • 1[1]STEVEN B.DAVIS,MEMBER,IEEE,AND PAUL MERMELSTEIN.Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences[J].IEEE trans on ASSP,1980,28 (4):357-366.
  • 2[2]G.M.White and R.B.Neely.Speech Recognition Experiments with Linear Prediction,Bandpass Filtering,and Dynamic Programming[J].IEEE trans on ASSP,1976(24):183-188.

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部