A review on Markov Decision Processes 被引量：4

A review on Markov Decision Processes

导出

摘要 MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy' selected by a decision-maker/controller, with the goal of maximizing an overall reward function that is an appropriately defined aggregate of immediate rewards, over either finite or infinite time horizon.As such MDPs are a useful paradigm for modeling many processes occurring naturally in the management and engineering contexts..

作者 J. A. Filar and LIU Ke Centre for Industrial and Applicable Mathematics , University of South Australia , Australia Institute of Applied Mathematics, Chinese Academy of Sciences , Beijing 100080, China

出处《Chinese Science Bulletin》 SCIE EI CAS 1999年第7期672-672,共1页

关键词 A review on Markov Decision Processes

分类号 O211.62 [理学—概率论与数理统计]

引文网络
相关文献

同被引文献3

1Fan, QC,Liu, RX,Xu, P,Lin, ZR.Intermediate-acidic silicate melt found in continental mantle of East China[J].Chinese Science Bulletin,1997,42(10):879-880. 被引量：6
2XU YigangGuangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China.Trace element characteristics and origin of intergranular components in mantle peridotites[J].Chinese Science Bulletin,2000,45(7):643-649. 被引量：5
3ZHANGHongfu YINGJifeng XUPing MAYuguang.Mantle olivine xenocrysts entrained in Mesozoic basalts from the North China craton: Implication for replacement process of lithospheric mantle[J].Chinese Science Bulletin,2004,49(9):961-966. 被引量：42

引证文献4

1唐新德,张其震,范星河,周其凤.Synthesis of Novel Carbosilane Dendrimers Based on Pentaerythritol[J].Chinese Journal of Chemistry,2004,22(11):1366-1371. 被引量：1
2ZHANG WenLan,SHAO JiAn,XU XiSheng,WANG RuCheng,CHEN LiHui.Mantle metasomatism by P- and F-rich melt/fluids: evidence from phosphate glass in spinel lherzolite xenolith in Keluo, Heilongjiang Province[J].Chinese Science Bulletin,2007,52(13):1827-1835.
3窦骏.巨细胞病毒UL97/UL54基因突变与抗病毒药物的相关性[J].国外医学（流行病学．传染病学分册）,2000,27(3):117-120.
4张菊亮,章祥荪.不等式约束最优化的非光滑精确罚函数的一个光滑近似[J].系统科学与数学,2000,20(4):499-505. 被引量：8

二级引证文献9

1ZHANG Juliang ZHANG Xiangsun (Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China).AN SQP METHOD BASED ON SMOOTHING PENALTY FUNCTION FOR NONLINEAR OPTIMIZATION WITH INEQUALITY CONSTRAINT[J].Journal of Systems Science & Complexity,2001,14(2):212-217. 被引量：3
2孙守霞,刘伟.基于磨光罚函数的求解非线性不等式约束优化问题的SQP方法[J].鲁东大学学报（自然科学版）,2009,25(3):206-209. 被引量：1
3罗晓琴,成央金,杨柳,余双.二阶随机占优约束保险资金资产组合优化[J].湖南工业大学学报,2013,27(2):99-104. 被引量：2
4陈珊珊,楼旭阳,崔宝同.参数线性规划问题的新型光滑精确罚函数神经网络[J].计算机系统应用,2014,23(10):193-197.
5刘彬彬,黄攀峰,孟中杰.系绳辅助的行星际轨道捕获研究[J].中国空间科学技术,2014,34(6):15-23.
6郑英,孟志青.一个新的目标罚函数算法[J].系统科学与数学,2016,36(10):1697-1709. 被引量：1
7ZHANGJuliang,ZHANGXiangsun.A ROBUST SQP METHOD BASED ON A SMOOTHING APPROXIMATE PENALTY FUNCTION FOR INEQUALITY CONSTRAINED OPTIMIZATION[J].Journal of Systems Science & Complexity,2002,15(1):102-112. 被引量：1
8吴于飞,刘江玲,鲍好园,杨雄发,来国桥,罗蒙贤.发散法合成超支化聚碳硅烷的研究进展[J].杭州师范大学学报（自然科学版）,2019,18(6):561-568.
9马文轩,于勇,胡俊.小口径超空泡子弹头部外形的优化设计[J].爆炸与冲击,2022,42(3):98-110. 被引量：2

1刘克.WEIGHTED DISCOUNTED MARKOV DECISION PROCESSES WITH PERTURBATION[J].Acta Mathematicae Applicatae Sinica,1999,15(2):183-189.
2刘建庸,刘克.MARKOVIAN DECISION PROGRAMMING WITH RECURSIVE REWARD FUNCTIONS[J].Chinese Science Bulletin,1988,33(14):1229-1230.
3R.H.Liu,Q.Zhang,G.Yin.SINGULARLY PERTURBED MARKOV DECISION PROCESSES WITH INCLUSION OF TRANSIENT STATES[J].Journal of Systems Science & Complexity,2001,14(2):199-211. 被引量：1
4欧阳伦群.一类Morita Contexts的同调维数[J].Journal of Mathematical Research and Exposition,2006,26(1):96-102. 被引量：1
5张书年.ASYMPTOTIC BEHAVIOR OF SOLUTIONS OF DIFFERENTIAL EQUATIONS WITH INFINITE DELAYS[J].Chinese Science Bulletin,1985,30(3):289-292.
6Xiao WU,Xiaolong ZOU,Xianping GUO.First passage Markov decision processes with constraints and varying discount factors[J].Frontiers of Mathematics in China,2015,10(4):1005-1023. 被引量：2
7郭先平.CONSTRAINED DENUMERABLE STATE NON-STATIONARY MDPs WITH EXPECTED TOTAL REWARD CRITERION[J].Acta Mathematicae Applicatae Sinica,2000,16(2):205-212.
8Study on the Relation between Selecting and Programming Responses in PRP Paradigm[J].Journal of Modern Transportation,1997,14(2):71-78.
9Wei Feng YANG.Virtual and Immediate Basins of Newton's Method for a Class of Entire Functions[J].Journal of Mathematical Research and Exposition,2010,30(5):920-928.
10FAN Yi Zheng ZHU Min.Maximizing Spectral Radius of Trees with Given Maximal Degree[J].Journal of Mathematical Research and Exposition,2009,29(5):806-812. 被引量：1

Chinese Science Bulletin

1999年第7期

浏览历史

内容加载中请稍等...

A review on Markov Decision Processes 被引量：4

同被引文献3

引证文献4

二级引证文献9

相关作者

相关机构

相关主题

浏览历史