SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法被引量：1

SHP-VI: A Shortest Hamiltonian Path-Based POMDP Value Iteration Algorithm

下载PDF

导出

摘要基于试探(trial-based)的值迭代算法是求解部分可观察Markov决策过程(partially observable Markov decision process,POMDP)模型的一类有效算法,其中FSVI算法是目前最快的算法之一.然而对于较大规模的POMDP问题,FSVI计算MDP值函数的时间是不容忽视的.提出一种基于最短哈密顿通路(shortest Hamiltonian path)的值迭代算法(shortest Hamiltonian path-based value iteration,SHP-VI).该方法用求解最短哈密顿通路问题的蚁群算法计算一条最优信念状态轨迹,然后在这些信念状态上反向更新值函数.通过与FSVI算法的实验比较,结果表明SHP-VI算法很大程度地提高了基于试探的算法计算信念状态轨迹的效率. Trial-based value iteration is a class of efficient algorithms to solve partially observable Markov decision process （POMDP）, among which FSVI is one of the fastest algorithms. But the overhead of computing MDP value function by FSVI is not negligible for large-scale POMDP problems. In this paper, we propose a new value iteration method based on the shortest Hamiltonian path （shortest Hamiltonian path-based value iteration, SHP-VI）. This method explores an optimal belief trajectory using the shortest Hamiltonian path resulting from ant colony optimization, and updates value function over the encountered belief states in reversed order. Compared with FSVI, the experimental results show that SHP-VI accelerates the computation of belief trajectory greatly in trial-based algorithms.

作者冯奇周雪忠黄厚宽张小平

机构地区北京交通大学计算机与信息技术学院

出处《计算机研究与发展》 EI CSCD 北大核心 2011年第12期2343-2351,共9页 Journal of Computer Research and Development

基金国家科技重大专项基金项目(2009ZX10005-019) 国家"九七三"重点基础研究发展计划基金项目(2006CB504601) 国家科技支撑计划基金项目(2007BAI10B06-01) 北京市科委科研攻关基金项目(D08050703020803 D08050703020804)

关键词部分可观察Markov决策过程值迭代基于点的算法基于试探的算法哈密顿通路 partially observable Markov decision process （POMDP） value iteration point-based algorithm trial-based algorithm Hamiltonian path

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献18

1Sondik E J. The optimal control of partially observable Markov processes [D]. Stanford, CA: Stanford University, 1971.
2Sondik E J. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs [J]. Operations Research, 1978, 26(2): 282-304.
3Kaelbling I. P, Littman M L, Cassandra A R. Planning and acting in partially observable stochastic domains [J]. Artificial Intelligence, 1998, 101(1/2): 99-134.
4Smith T. Probabilistic planning for robotic exploration [D]. Pittsburgh, PA: Carnegie Mellon University, 2007.
5Hauskreeht M, Fraser H. Planning treatment of isehemic heart disease with partially observable Markov decision processes [J]. Artificial Intelligence in Medicine, 2000, 18 (3) : 221-244.
6张波,蔡庆生,郭百宁.口语对话系统的POMDP模型及求解[J].计算机研究与发展,2002,39(2):217-224. 被引量：7
7Pineau J, Gordon G, Thrun S. Point-based value iteration: An anytime algorithm for POMDPs [C]//Proc of the 18th Int Joint Conf on Artificial Intelligence (IJCAI). San Francisco: Morgan Kaufmann, 2003: 1025-1030.
8Spaan, M T J, Vlassis N. Perseus: Randomized point-based value Iteration for POMDPs [J]. Journal of Artificial Intelligence Research, 2005, 24(1) : 195-220.
9卞爱华,王崇骏,陈世福.基于点的POMDP算法的预处理方法[J].软件学报,2008,19(6):1309-1316. 被引量：6
10Izadi M T, Precup D. Exploration in POMDP belief space and its impact on value iteration approximation[C]//Proc of the 17th European Conf on Artificial Intelligence (ECAI). Amsterdam: IOS, 2006.

二级参考文献11

1周继恩,刘贵全,张春阳,蔡庆生.基于内部信念状态POMDP模型在用户兴趣获取中的应用[J].小型微型计算机系统,2004,25(11):1979-1983. 被引量：5
2陈茂,陈小平.基于采样的POMDP近似算法[J].计算机仿真,2006,23(5):64-67. 被引量：2
3[1]M A Walker. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research, 2000, 12: 387～416
4[2]E Levin, R Pieraccini, W Eckert. Using Markov decision process for learning dialogue strategies. In: Proc of Int'l Conf on Acoustics, Speech, and Signal Processing (ICASSP-97). Munich, Germany, 1997
5[3]T Paek, E Horvitz. Uncertainty, utility, and misunder-standing: A decision-theoretic perspective on grounding in conversational systems. AAAI Fall Symp on Psychological Models of Communication in Collaborative Systems. North Falmouth, Massachusetts, USA, 1999
6[4]L P Kaelbling, M L Littman, A R Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998, 101: 99～134
7[5]N Roy, J Pineau, S Thrun. Spoken dialogue management using probabilistic reasoning. In: Proc of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000). Hong Kong, 2000
8[6]M Hauskrecht. Value-function approximations for partially observable Markov decision problems. Journal of Artificial Intelligence Research, 2000, 13: 33～94
9[7]C Boutilier, D Poole. Computing optimal policies for partially observable decision processes using compact representations. In: Proc of the 13th National Conf on Artificial Intelligence (AAAI-96). 1996. Portland, Oregon, USA, 1168～1175
10[8]A Cassandra, M L Littman, N L Zhang. Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In: Proc of the 13th Annual Conf on Uncertainty in Artificial Intelligence (UAI-97). Providence, Rhode Island, USA, 1997. 54～61

共引文献10

1卞爱华,王崇骏,陈世福.基于点的POMDP算法的预处理方法[J].软件学报,2008,19(6):1309-1316. 被引量：6
2刘繁茂,朱海平,邵新宇,高贵兵.状态不完全可观条件下设备检修策略研究[J].计算机集成制造系统,2009,15(8):1628-1632. 被引量：3
3吴涛,王崇骏,谢俊元.基于部分可观测马尔可夫决策过程的网络入侵意图识别研究[J].南京大学学报（自然科学版）,2010,46(2):122-130. 被引量：3
4仵博,吴敏,佘锦华.基于点的POMDPs在线值迭代算法[J].软件学报,2013,24(1):25-36. 被引量：3
5章宗长,陈小平.杂合启发式在线POMDP规划[J].软件学报,2013,24(7):1589-1600. 被引量：3
6陈丽娜,黄宏斌,邓苏.基于点的FO-POMDP值迭代方法研究[J].计算机工程,2013,39(10):217-220. 被引量：1
7王玉,任福继,全昌勤.口语对话系统中对话管理方法研究综述[J].计算机科学,2015,42(6):1-7. 被引量：3
8钟可立,王小捷.基于信息熵的POMDP模型观测函数估计[J].中兴通讯技术,2015,21(5):50-55.
9赵阳洋,王振宇,王佩,杨添,张睿,尹凯.任务型对话系统研究综述[J].计算机学报,2020,43(10):1862-1896. 被引量：43
10马颖,王珂,吴戈男,邢哲.基于强化学习的自适应编码调制策略[J].电子技术应用,2023,49(5):35-40.

同被引文献5

1章宗长,陈小平.杂合启发式在线POMDP规划[J].软件学报,2013,24(7):1589-1600. 被引量：3
2许瑞琛,蒋挺.基于POMDP的认知无线电自适应频谱感知算法[J].通信学报,2013,34(6):49-56. 被引量：12
3万开方,高晓光,李波,梅军峰.基于部分可观察马尔可夫决策过程的多被动传感器组网协同反隐身探测任务规划[J].兵工学报,2015,36(4):731-743. 被引量：12
4卓琨,张衡阳,徐丁海,郑博,黄国策.基于POMDP的机载网络信道接入策略研究[J].系统工程与电子技术,2016,38(3):658-664. 被引量：4
5刘峰,王崇骏,骆斌.一种基于最优策略概率分布的POMDP值迭代算法[J].电子学报,2016,44(5):1078-1084. 被引量：4

引证文献1

1刘峰.基于杂合标准的POMDP值迭代求解算法[J].模式识别与人工智能,2016,29(11):961-968. 被引量：1

二级引证文献1

1王晓华.面向认知无线通信的动态频谱接入技术[J].信息技术,2020,44(11):137-141. 被引量：1

1刘向娇,吴素萍,刘佳梅.用回溯法求哈密顿通路[J].软件,2010,31(11):54-56.
2冯延蓬,仵博,郑红燕.异构无线传感器网络中基于POMDP的实时调度算法[J].仪表技术与传感器,2012(8):101-104. 被引量：2
3仵博,吴敏,佘锦华.基于点的POMDPs在线值迭代算法[J].软件学报,2013,24(1):25-36. 被引量：3
4关红叶,陈宏刚.云服务组合次序研究[J].广东通信技术,2014,34(9):27-31.
5冯延蓬,仵博,郑红燕.基于FPOMDP的无线传感器网络动态调度算法[J].计算机应用与软件,2012,29(8):55-58. 被引量：1
6冯奇,周雪忠,黄厚宽,张小平.POMDP基于点的值迭代算法中一种信念选择方法[J].北京交通大学学报,2009,33(5):77-80. 被引量：3
7魏唯,欧阳丹彤,吕帅.基于缩减信念状态的Conformant规划方法[J].软件学报,2013,24(7):1557-1570. 被引量：1
8卞爱华,王崇骏,陈世福.基于点的POMDP算法的预处理方法[J].软件学报,2008,19(6):1309-1316. 被引量：6
9房俊恒,朱斐,刘全,伏玉琛,凌兴宏.一种基于独立任务的POMDP问题的解决方法[J].计算机应用研究,2016,33(1):147-152.
10孙湧,仵博,冯延蓬.基于策略迭代和值迭代的POMDP算法[J].计算机研究与发展,2008,45(10):1763-1768. 被引量：7

计算机研究与发展

2011年第12期

浏览历史

内容加载中请稍等...

SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法被引量：1

参考文献18

二级参考文献11

共引文献10

同被引文献5

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法 被引量：1

参考文献18

二级参考文献11

共引文献10

同被引文献5

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

SHP-VI:一种基于最短哈密顿通路的POMDP值迭代算法被引量：1