期刊文献+

模拟人类发散思维的测度值马尔可夫理论模型 被引量:1

A theoretical model of measure-valued Markov processes simulating the divergent thinking of man
下载PDF
导出
摘要 本文提出测度值马尔可夫决策过程新模型.在此模型下,agent对环境的把握用测度概念来表示,于是agent则根据测度来决定自己的最优行动以得到最优策略,因此本文也提供了测度值马尔可夫决策过程的最优策略算法.该模型是部分可观察马尔可夫决策过程的推广,它反映人类思维的一个重要特征,人们在把握全部状态可能性(即对状态空间进行权衡度量)的态势下,思考问题并选择自己的最优行动.部分可观察马尔可夫决策过程只是它的一种特例. This paper presents a model called measure-valued Markov decision processes (MVMDPs) and within this model the understanding of the agent to the environment is denoted by the mathematical notion of measure. The agent decides his own optimal action according to this measure and then acquires his optimal policy. So we present an algorithm of finding optimal policy under MVMDP, which can also be considered as the approximate optimal policy algorithm of partially observed Markov decision processes (POMDPs). This model is a generalization of a partially observed Markov decision process, that is, partially observed Markov decision process is a particular case of the measure-valued Markov decision process. Be that as it may, it is essentially different from all other papers about POMDPs. Firstly, the main spirit of general POMDPs is to transform partially observable Markov decision problems off a physical state space into a regular Markov decision problem (MDP) on the corresponding belief state space, and such researches all identify the belief state as a probability distribution over the state space. So most of the POMDP models based on this spirit pay more attention to algorithm Of various kinds for finding the optimal policy and to novel refinements of existing techniques. However, our work is not based on the transformation between the POMDP on a physical state space and the MDP on a belief state space. On the contrary we take the measure, a more general notion than belief state, on the state space as a new studying object. Then the Markov decision problem we will discuss is taking place on the space composed of these measures. In this way, we have a measure-valued Markov decision process. Secondly, MVMDP, based on the latest theory of measur-valued branching processes in modern probability, reflects an important characteristic of human mind: that people think about problems and choose their own optimal actions in contexts where all the possible states are caught (i. e. , they are able to appropriately measure the state space). In other words, in many cases when the solutions to a problem have not yet emerged or even when the problem itself can be explicitly expressed, the style of human thinking does not move from a definite point to another definite point with time, as the logical reasoning does, but evolves through the changes of the whole grasp of the problem, that is it can creatively proceed from “area” to “area”. This “area” manner is what we called “measure” which reflects the understanding of people to the environment. For this reason human mind obeys the laws of quantum mechanics, that is it exists in a probability manner. This phenomena embodied in our model is the evolution of the random variable taking measures on the state space. In summary, we think MVMDP not only can deepen the understanding of MDPs and POMDPs, but also can provide a new pattern for studying the characteristics of human mind. Just as people can manufacture aircraft only after having deeply understood the aerodynamics, we can deepen the studying of AI only when we have deeply understood the essence of human mind. MVMDP is just the initial effort to try to deepen the understanding of human mind.
出处 《南京大学学报(自然科学版)》 CSCD 北大核心 2008年第2期148-156,共9页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(90412014) 计算机软件新技术开放课题(A200707)
关键词 测度值 测度值分枝过程 马尔可夫决策过程 measure-valued Markov decision processes, measure-valued branching processes, Markovdecision processes
  • 相关文献

参考文献10

  • 1Stuart R,Peter N.姜哲译.人工智能—一种现代方法.第二版.北京:人民邮电出版社,2004,482-483.
  • 2William S L. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 1991,28: 47-66.
  • 3Nicholas R, Geoffrey G. Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 2005, 23: 1-40.
  • 4Matthijs T J S, Nikos V. Perseus: Randomized point-based valued iteration for POMDPs. Journal of Artificial Intelligence Research, 2005, 24: 195-220.
  • 5Nevin L Z, Weihong Z. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2001, 14: 29-51.
  • 6Tuttle E, Ghahramani Z. Propagating uncertainty in POMDP value iteration with Gaussian processes. Technical Report. Gatsby Computational Neuroscience Unit, 2004.
  • 7Morision M. Measurement processes are software, too. The Journal of Systems and Software, 1999, 49: 17-31.
  • 8林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的话者识别[J].南京大学学报(自然科学版),2006,42(1):54-62. 被引量:23
  • 9Hauskrecht M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2000, 13: 33-94.
  • 10Munos R, Moore A. Variable resolution discretization in optimal control. Machine Learning, 2002, 49(2-3): 291-323.

二级参考文献12

  • 1栗学丽,丁慧,徐柏龄.基于熵函数的耳语音声韵分割法[J].声学学报,2005,30(1):69-75. 被引量:34
  • 2杨莉莉,李燕,徐柏龄.汉语耳语音库的建立与听觉实验研究[J].南京大学学报(自然科学版),2005,41(3):311-317. 被引量:13
  • 3Schwartz M F, Rine M F. Identification of speaker sex from isolated, whispered vowels. Journal of Acoustical Society of America, 1968, 44 ( 6 ) :1 736-1737.
  • 4Yu H. The whisper is not helpful for treating hoarseness and recovering voice. Journal of the Central University for Nationalities, 1996,5 ( 2 ) :163 - 166.
  • 5Itoh T, Takeda K,Itakura F. Acoustic analysis and recognition of whispered speech. Proceedings of ICASSP. Orlando, Florida, USA, 2002,389 -392.
  • 6Morris R W, Clements A M. Reconstruction of speech from whispers. Medical Engineering and Physics, 2002,24 (8) : 515 - 520.
  • 7Morris R W. Enhancement and recognition of whispered speech. PhD Thesis, Georgia Institute of Technology, 2002.
  • 8Li X L, Xu B L. Formant comparison between Mandarin whispered and voiced vowels. Acta Acustica United with Acustica, 2005,91 (6) : 1 -7.
  • 9Sahar E B, John H L H. A comparative study of tradition and newly proposed features for recognition of speech under stress. IEEE Transactions of Speech and Signal Processing, 2000, 8(4) :429 -442.
  • 10Lawrence R, Juang B H. Fundamentals of speech recognition. Prentice Hall, 1993, 321 - 389.

共引文献22

同被引文献13

  • 1Boutilier C, Reiter R, Price B. Symbolic dynamic programming for First-order MDPs. Seventeenth In ternational Joint Conference on Artificial Intelligence ( IJCAI- 01 ), Seattle, USA, 690 - 700.
  • 2Van M O. Reinforcement learning for relational MDPs. Proceedings of the Annual Machine I.earning Conference of Belgium and the Netherlands, Brussels, Belgium, 2004 : 138- 145.
  • 3Kersting K ,De Raedt L. I.ogical Markov decision programs. In Working Notes of the IJCAI 2003 Workshop on Learning Statistical Models from Relational Data ( SRL 03 ), Acapulco, Mexico. 2003:63-70.
  • 4Kersting K,De Raedt I.. Logical Markov decision programs and the convergence of logic TD O. Proceedings of The 146^th International Conference of Inductive Logic Programming, Porto, Portugal, 2004 : 180 - 197.
  • 5Ravindran B G, Barto A. Symmetries and model minimization in Markov decision processes. TechnicalReport:UM-CS-2001 043.
  • 6Ravindran B G,Barto A. SMDP Homomorphisms: An algebraic approach to abstraction in sembMarkov decision processes. Proceeding of the 18'h International Joint Conference on Artificial Intelligence, 2003:1011 -1016.
  • 7Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999,112 : 181 - 211.
  • 8Whitehead S D, Lin L J. Reinforcement learning of non-Markov decision processes. Artificial In- telligence, 1995,73 : 271 -306.
  • 9Das T K, Gosavi A, Mahadevan S. et al. Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 1999,45(4) : 560-574.
  • 10Kersting K, Van Otterlo M, De Raedt L. Bellman goes relational. Proceedings of the 21^th International Conference on Machine Learning, Banff, Canada, 2004:465-472.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部