POMDP基于点的值迭代算法中一种信念选择方法被引量：3

A Belief Selection Method in POMDP Point-Based Value Iteration Algorithm

下载PDF

导出

摘要部分可观察马尔可夫决策过程(POMDP)是描述不确定环境下进行决策的数学模型.基于点的值迭代算法是求解POMDP问题的一类近似解法.针对基于点的算法中信念选择这一关键问题,提出了一种基于熵的信念选择方法(EBBS).EBBS算法通过计算可以转移到的信念点的不确定性,选择熵较小且到当前信念点集距离大于一定阈值的信念点扩充信念点集合.实验结果表明,通过熵选择信念点的值迭代算法只需要在较少数量的信念点上进行值迭代操作就能得到预期的折扣报酬. Partially Observable Markov Decision Process （POMDP） provides a mathematical model for decision making under uncertainty. Point-Based value iteration algorithms are effective proximate algorithms to solve POMDP problems. In this paper we propose a belief selection method, Entropy-Based Belief Selection （EBBS）, based on the entropy of belief points to the crucial issue of point-based algorithms. The EBBS algorithm first sorts the belief points by entropy and then selects belief that has lower entropy and whose distance to the current set is more than a threshold. And the experimental results illustrate that this method could perform value iteration operation on fewer belief points to gain an expected discounted reward.

作者冯奇周雪忠黄厚宽张小平

机构地区北京交通大学计算机与信息技术学院

出处《北京交通大学学报》 CAS CSCD 北大核心 2009年第5期77-80,共4页 JOURNAL OF BEIJING JIAOTONG UNIVERSITY

基金国家自然科学基金资助项目(90709006) 国家"973"项目资助(2006CB504601) 北京市科委重大计划项目资助(H020920010130) 国家科技支撑计划项目资助(2007BA110B06-01)

关键词 POMDP 值迭代基于点的算法信念选择不确定性 POMDP value iteration point-based point algorithm belief selection uncertainty

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1Sondik E J. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs[J]. Operations Research, 1978, 26(2): 282-304.
2Kaelbling L P, Littman M L, Cassandra AR. Planning and Acting in Partially Observable Stochastic Domains[ C] // Artificial Intelligence, 1998, 101: 99- 134.
3刘克．实用马尔科夫决策过程[M]．北京:清华大学出版社，2004．
4Zhang N L, Zhang W. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes[J]. Journal of Artificial IntelLigence Research, 2001(14): 29-51.
5Pineau J, Gordon G, Thrun S. Point-Based Value Iteration: An Anytime Algorithm for POMDPs[C]//// Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI), Acapulco, Mexico,2003: 1025-1030.
6Izadi M T, Precup D, Azar D. Belief Selection in Point- Based Planning Algorithms for Pomdps[ C]// Proceedings of Canadian Conference on Artificial Intelligence (AI), Quebec City, Canada, 2006: 383- 394.
7Izadi M T, Precup D. Exploration in POMDP Belief Space and Its Impact on Value Iteration Approximation[ C]// European Conference on Artificial Intelligence (ECAI). Riva del Garda, Italy, 2006.
8Shani G, Brafman R I, Shimony S E. Forward Search Value Iteration For POMDPs[ C]//Proc. Int. Joint Conf. on Artificial Intelligence(IJCAI), 2007 : 2619 - 2624.
9Pineau J, Gordon G, Thrun S. Point-Based Approximations for Fast POMDP Solving[R]. Technical Report, SOCS-TR-2005.4, School of Computer Science, McGill University, 2005:1 - 45.

共引文献9

1马巧云,洪流,陈学广.多Agent系统中任务分配问题的分析与建模[J].华中科技大学学报（自然科学版）,2007,35(1):54-57. 被引量：11
2赵海峰,姜兴宇,王贵和,王宛山.基于马尔可夫决策过程的MES系统动态调度方法[J].东北大学学报（自然科学版）,2007,28(8):1178-1181. 被引量：2
3王炜,刘茂,王丽.基于马尔科夫决策过程的应急资源调度方案的动态优化[J].南开大学学报（自然科学版）,2010,43(3):18-23. 被引量：22
4张文柱,孙发勇,王炫.基于马尔科夫决策的容迟网络路由算法[J].西安电子科技大学学报,2011,38(2):18-22. 被引量：3
5费蓉,胡博.一种时变的随机马氏移动对象行为仿真模型[J].系统仿真学报,2012,24(9):1751-1756.
6朱黎敏,耿娜,谢晓岚.基于MDP的门诊病人图像检查最优预约调度[J].工业工程与管理,2014,19(3):101-108. 被引量：10
7黎嘉明,李大虎,孙建波,侯云鹤,文劲宇.结合状态削减技术的电力系统黑启动动态规划算法[J].中国电机工程学报,2014,34(25):4409-4419. 被引量：6
8曲大成,房振明.基于隐马尔科夫模型的波动率预测探究[J].电子设计工程,2014,22(18):1-3. 被引量：4
9刘胜美,牛雪玲.异构网络中基于会话分流的接纳控制算法研究[J].计算机技术与发展,2015,25(8):48-52. 被引量：1

同被引文献18

1Kaelbling L P, Littman M L, Cassandra A R. Planning and act- ing in partially observable stochastic domains[J]. Artificial In- telligence, 1998, 101(1/2): 99-134.
2Cassandra A, Littman M, Zhang N. Incremental pruning: A simple, fast, exact method for partially observable Markov de- cision processes[C]//Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1997: 54-61.
3Borera E C, Pyeatt L D, Randrianasolo A S, et al. POMDP fil- ter: Pruning POMDP value functions with the Kaczmarz itera- tive method[M]//Lecture Notes in Computer Science: vol.6437. Berlin, Germany: Springer-Verlag, 2010: 254-265.
4Pineau J, Gordon G, Thrun S. Anytime point-based approxi- mations for large POMDPs[J]. Journal of Artificial Intelligence Research, 2006, 27(1): 335-380.
5Roy N, Gordon G, Thrun s. Finding approximate POMDP so- lutions through belief compression[J]. Journal of Artificial In- telligence Research 2005, 23(1): 1-40.
6Zhang N L, Zhang W H. Speeding up the convergence of value iteration in partially observable Markov decision processes[J]. Journal of Artificial Intelligence Research, 2001, 14(1): 29-51.
7Izadi M T, Precup D, Azar D. Belief selection in point-based planning algorithms for POMDPs[C]//Canadian Society for Computational Studies of Intelligence Conference. Berlin, Ger- many: Springer-Verlag, 2006: 383-394.
8Shani G. Evaluating point-based POMDP solvers on multicore machines[J]. IEEE Transactions on Systems, Man, and Cyber- netics, Part B: Cybernetics, 2010, 40(4): 1062-1074.
9Cassandra A R. A Survey of POMDP Applications[C]//Proc. of Symposium on Planning with Partially Observable Markov Decision Processes. [S. 1.]: AAAI Press, 1998.
10Zhang Shiqi, Sridharan M. Vision-based Scene Analysis on Mobile Robots Using Layered POMDPs[C]//Proc. of International Conference on Automated Planning and Scheduling. Toronto, Canada: [s. n.], 2010.

引证文献3

1郑红燕,仵博,冯延蓬,孟宪军.基于信念点裁剪策略树的POMDP求解算法[J].信息与控制,2013,42(1):53-57. 被引量：1
2陈丽娜,黄宏斌,邓苏.基于点的FO-POMDP值迭代方法研究[J].计算机工程,2013,39(10):217-220. 被引量：1
3陈前斌,何小强,吴攀,唐伦.基于部分可测马尔科夫决策过程业务感知的微基站休眠时长确定策略[J].电子与信息学报,2018,40(1):130-136. 被引量：2

二级引证文献4

1代红英,孙霞,周朋光.Femtocell中基于负载预测的基站休眠节能方案[J].计算机应用研究,2019,36(8):2492-2495. 被引量：1
2华夏,王新晴,马昭烨,王东,邵发明.复杂大交通场景弱小目标检测技术[J].计算机应用研究,2019,36(11):3486-3492. 被引量：5
3钟建,徐扬,陈树伟,何星星.一阶逻辑中基于稳定度的项评估方法[J].计算机工程,2019,45(11):183-190.
4张晓彤,王嘉诚,何景涛,陈仕韬,郑南宁.面向不确定性环境的自动驾驶运动规划:机遇与挑战[J].模式识别与人工智能,2023,36(1):1-21. 被引量：3

1卞爱华,王崇骏,陈世福.基于点的POMDP算法的预处理方法[J].软件学报,2008,19(6):1309-1316. 被引量：6
2仵博,吴敏,佘锦华.基于点的POMDPs在线值迭代算法[J].软件学报,2013,24(1):25-36. 被引量：3
3周德玉.语言的ψ—表达式和有限自动机状态方程的近似解法[J].重庆大学学报（自然科学版）,1989,12(3):62-68.
4房俊恒,朱斐,刘全,伏玉琛,凌兴宏.一种基于独立任务的POMDP问题的解决方法[J].计算机应用研究,2016,33(1):147-152.
5冯奇,周雪忠,黄厚宽,张小平.SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法[J].计算机研究与发展,2011,48(12):2343-2351. 被引量：1
6郑延斌,郭凌云,刘晶晶.多智能体系统分散式通信决策研究[J].计算机应用,2012,32(10):2875-2878. 被引量：3
7仵博,郑红燕,冯延蓬.POMDPs算法复杂度对比分析研究[J].深圳职业技术学院学报,2013,12(1):3-10.
8修国明,张积滨,潘启树.基于实例的POMDP问题的近似求解[J].计算机工程与应用,2008,44(29):82-85.
9俞奎,王浩,姚宏亮.动态影响图模型研究[J].智能系统学报,2008,3(2):159-166. 被引量：2
10周德玉.语言的幂级数表达式和有限自动机状态方程的近似解法[J].应用数学学报,1991,14(1):66-72.

北京交通大学学报

2009年第5期

浏览历史

内容加载中请稍等...

POMDP基于点的值迭代算法中一种信念选择方法被引量：3

参考文献9

共引文献9

同被引文献18

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

POMDP基于点的值迭代算法中一种信念选择方法 被引量：3

参考文献9

共引文献9

同被引文献18

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

POMDP基于点的值迭代算法中一种信念选择方法被引量：3