逻辑马尔可夫决策过程的正则条件概率理论被引量：1

Theory for regularly conditional probability of logical Markov decision process

下载PDF

导出

摘要增强学习已经开始向关系增强学习发展,并且产生了许多新的算法.大部分方法将命题表达提升为关系或计算逻辑的表达.这些方法已经表现出许多好的性质,但是相关的理论分析目前还缺乏,即为什么这些关系的增强学习具有良好的性质,因此提出基础马尔可夫决策过程和逻辑马尔可夫决策过程的测度空间结构,利用现代概率论中条件数学期望和正则条件概率理论建立基础和逻辑两种马尔可夫决策过程之间的深刻联系,从而证实了逻辑马尔可夫决策过程中的最优策略在某种平均意义上是相应的基础马尔可夫决策过程的最优策略.最后由实例分析得出逻辑马尔可夫决策编程方法.建立逻辑马尔可夫决策过程的测度空间结构可以为关系增强学习提供数学理论框架. Because of very large states in the real world, reinforcement learning develops towards relational reinforcement learning and many approaches are presented such as logical Markov decision processes. Many of these approaches are to upgrade propositional representations towards the use of relational or computational logic repre sentations. These approaches have already shown many good qualities. However,a relative theory is missing,that is, why do these relational reinforcement learning approaches have good quality？ So we construct a ground measure space for underlying Markov decision processes and a logical measure space for logical Markov decision processes, and then use two profound concepts of conditional expectation and regular conditional probability in modern probability theory to combine the two spaces. In this way, we establish a profound relationship between underlying Markov decision process（MDP）and logical Markov decision process. Within this kind of mathematic framework we prove that an optimal policy found at abstraction level always is optimal at the ground level of the underlying Markov decision process in some average sense. Many relational reinforcement learning techniques have this property,but do not give such a proof. Moreover,we put forward definitely the semantics of the probability and of the reward function in an abstract transition of logical Markov decision processes. The Markov decision progress built on both the hound measure space and the logical measure space also reflects an important characteristic of human mind, that is, people when facing various problems especially complex problems,always tackle them from an abstract or principle perspective. Having gotten a whole plan,details are then considered. Finally, the plan is actually carried out. In this paper, we think of problems from two different levels. Logical MI）P corresponds an abstract level that can avoid tremendous and explicit states, thus making problems simple. When we find an optimal policy in the logical MDP,this means we have gotten an optimal resolution in the ground MI）P under the average meaning. Many techniques of relational reinforcement learning share with this char- acteristic. Through the framework we construct in this paper,the proof of this characteristic is very clear. In summary, we think this framework will not only bring a stochastic and intelligent style for reinforcement learning,hut also provide a sound basis for the verification of the validity of logical Markov decision process theory. Moreover, it can provide a new pattern for studying the characteristic of human mind. Just as people can manufacture aircraft only after having deeply understood the aerodynamics, we can deepen the studying of AI only when we have deeply understood the essence of human mind. Two spaces of Markov decision process is just the initial effort to try to deepen the understanding of human mind.

作者王蓁蓁邢汉承

机构地区金陵科技学院信息技术学院江苏省信息分析工程实验室东南大学计算机科学与工程学院

出处《南京大学学报（自然科学版）》 CAS CSCD 北大核心 2013年第4期439-447,共9页 Journal of Nanjing University（Natural Science）

基金金陵科技学院科研基金(jit-b-201207)

关键词概率测度空间逻辑马尔可夫决策过程正则条件概率增强学习 probability measure space, logical Markov decision process, regularly conditional probability

分类号 O151.21 [理学—基础数学]

引文网络
相关文献

参考文献14

1Boutilier C, Reiter R, Price B. Symbolic dynamic programming for First-order MDPs. Seventeenth In ternational Joint Conference on Artificial Intelligence ( IJCAI- 01 ), Seattle, USA, 690 - 700.
2Van M O. Reinforcement learning for relational MDPs. Proceedings of the Annual Machine I.earning Conference of Belgium and the Netherlands, Brussels, Belgium, 2004 : 138- 145.
3Kersting K ,De Raedt L. I.ogical Markov decision programs. In Working Notes of the IJCAI 2003 Workshop on Learning Statistical Models from Relational Data ( SRL 03 ), Acapulco, Mexico. 2003:63-70.
4Kersting K,De Raedt I.. Logical Markov decision programs and the convergence of logic TD O. Proceedings of The 146^th International Conference of Inductive Logic Programming, Porto, Portugal, 2004 : 180 - 197.
5Ravindran B G, Barto A. Symmetries and model minimization in Markov decision processes. TechnicalReport:UM-CS-2001 043.
6Ravindran B G,Barto A. SMDP Homomorphisms: An algebraic approach to abstraction in sembMarkov decision processes. Proceeding of the 18'h International Joint Conference on Artificial Intelligence, 2003:1011 -1016.
7Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999,112 : 181 - 211.
8Whitehead S D, Lin L J. Reinforcement learning of non-Markov decision processes. Artificial In- telligence, 1995,73 : 271 -306.
9Das T K, Gosavi A, Mahadevan S. et al. Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 1999,45(4) : 560-574.
10Kersting K, Van Otterlo M, De Raedt L. Bellman goes relational. Proceedings of the 21^th International Conference on Machine Learning, Banff, Canada, 2004:465-472.

二级参考文献10

1林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的话者识别[J].南京大学学报（自然科学版）,2006,42(1):54-62. 被引量：22
2Stuart R,Peter N.姜哲译.人工智能—一种现代方法.第二版.北京:人民邮电出版社,2004,482-483.
3William S L. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 1991,28: 47-66.
4Nicholas R, Geoffrey G. Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 2005, 23: 1-40.
5Matthijs T J S, Nikos V. Perseus: Randomized point-based valued iteration for POMDPs. Journal of Artificial Intelligence Research, 2005, 24: 195-220.
6Nevin L Z, Weihong Z. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2001, 14: 29-51.
7Tuttle E, Ghahramani Z. Propagating uncertainty in POMDP value iteration with Gaussian processes. Technical Report. Gatsby Computational Neuroscience Unit, 2004.
8Morision M. Measurement processes are software, too. The Journal of Systems and Software, 1999, 49: 17-31.
9Hauskrecht M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2000, 13: 33-94.
10Munos R, Moore A. Variable resolution discretization in optimal control. Machine Learning, 2002, 49(2-3): 291-323.

同被引文献15

1高鹰,陈意云.基于抽象解释的代码迷惑有效性比较框架[J].计算机学报,2007,30(5):806-814. 被引量：16
2CousotP,CousotR. Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixpoints. In:Proceedingsof the 4thAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Los Angeles, California:ACM Press, 1977, 238-252.
3MitchellJC.程序设计语言理论基础.许满武,徐建,衷宜.北京:电子工业出版社,2006,224-460.
4He H, Gupta N. Automated debugging using path-based weakest preconditions. In: Michel W, Tiziana M. FASE, Lecture Notes in Computer Science: Springer –Verlag, 2004, 2984: 267-280.
5CousotP, CousotR.bstract interpretation frameworks.Journal of Logic and Computation, 1992, 2(4):511-547.
6CousotP,CousotR. Temporal abstract interpretation. In:Proceedings of the27thACM SIGACT-SIGMOD-SIGART Symposium on Principles of Programming Languages, Boston, MA, USA: ACM Press, 2000, 12-25.
7GiacobazziR,RanzatoF,ScozzariF. Making abstract interpretations complete. Journal of the ACM,2000, 47(2): 361-416.
8CousotP,CousotR. Systematic design of program analysis frameworks. In:Proceedings of the 6thACM Symposium on Principles of Programming Languages, San Antonio, TX:ACM Press, 1979, 269-282.
9RanzatoF,TapparoF. Generalized strong preservation by abstract interpretation.Journal of Logic and Computation, 2007, 17(1): 157-197.
10RanzatoF,TapparoF. Strong preservation of temporal fixpoint-based operators by abstract interpretation. In: Proceedings of the 7th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI). LNCS 3855. Charleston, SC, USA: Springer Verlag, 2006, 332-347.

引证文献1

1王蓁蓁.抽象解释的部分等价逻辑关系模型[J].南京大学学报（自然科学版）,2015,51(2):453-457.

1任继东,席福宝.一类扩散过程的依分布稳定性[J].北京理工大学学报,2002,22(5):540-543.
2邹成,刘喜玲.符号动力系统中移位映射的熵[J].喀什师范学院学报,2008,29(6):10-12.
3任平.数学教学中如何发挥学生的主体作用[J].科学咨询,2009(10):57-58.
4陈泽凡.创造力与数学认知结构[J].益阳师专学报,2000,17(6):74-76. 被引量：2
5朱传喜.1-集压缩型随机算子方程的若干定理[J].数学进展,1998,27(5):464-468. 被引量：15
6周从容.小学数学后进生转化的几点思考[J].教育界（教师培训）,2012(4):163-163.
7刘琼芳.没有原子的概率测度空间的一个性质[J].湖北大学学报（自然科学版）,2008,30(4):340-342.
8周亚清.质量管理中的马尔可夫分析法[J].水利电力机械电子技术,1991,5(3):43-47.
9杨国平.合理设计“导学案”，创设高效数学课堂[J].未来英才,2016,0(17):31-31.
10张春琴,杨芳.拟概率空间上的强大数定律(英文)[J].数学杂志,2012,32(6):999-1004. 被引量：2

南京大学学报（自然科学版）

2013年第4期

浏览历史

内容加载中请稍等...

逻辑马尔可夫决策过程的正则条件概率理论被引量：1

参考文献14

二级参考文献10

同被引文献15

引证文献1

相关作者

相关机构

相关主题

浏览历史

逻辑马尔可夫决策过程的正则条件概率理论 被引量：1

参考文献14

二级参考文献10

同被引文献15

引证文献1

相关作者

相关机构

相关主题

浏览历史

逻辑马尔可夫决策过程的正则条件概率理论被引量：1