实时动态规划的最优行动判据及算法改进被引量：8

Optimal Action Criterion and Algorithm Improvement of Real-Time Dynamic Programming

下载PDF

导出

摘要主要以提高求解马尔可夫决策问题的实时动态规划(real-time dynamic programming,简称RTDP)算法的效率为目的.对几类典型的实时动态规划算法所使用的收敛判据进行了对比分析,并利用值函数上界、下界给出了称为最优行动判据的收敛判据,以及一个更适合实时算法的分支选择策略.最优行动判据可以更早地标定当前状态满足精度要求的最优行动供立即执行,而新的分支选择策略可以加快这一判据的满足.据此设计了一种有界增量实时动态规划(bounded incremental RTDP,简称BI-RTDP)算法.在两种典型仿真实时环境的实验中,BI-RTDP均显示出优于现有相关算法的实时性能. This paper is primarily to improve the efficiency of real-time dynamic programming （RTDP） algorithm for solving Markov decision problems. Several typical convergence criteria are compared and analyzed. A criterion called optimal action criterion and a corresponding branch strategy are proposed on the basis of the upper and lower bound theory. This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains. It can be proved that under certain conditions one can obtain an optimal policy with arbitrary precision by using such an incremental method. With these new techniques, a bounded incremental real-time dynamic programming （BI-RTDP） algorithm is designed. In the experiments of two typical real-time simulation systems, BI-RTDP outperforms the other state-of-the-art RTDP algorithms tested.

作者范长杰陈小平

机构地区中国科学技术大学计算机科学与技术系

出处《软件学报》 EI CSCD 北大核心 2008年第11期2869-2878,共10页 Journal of Software

基金 Supported by the National Natural Science Foundation ofChina under Grant No.60745002(国家自然科学基金) the National Basic Research Program of China under No.2003CB317002(国家重点基础研究发展计划(973))

关键词马尔可夫决策过程实时动态规划收敛判据增量求解启发式搜索 MDP （Markov decision process） RTDP （real-time dynamic programming） convergence criterion incremental solving heuristic search

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献14

1Boutilier C, Dean T, Hanks S. Decision-Theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 1999,11 : 1-94.
2Hansen EA, Zilberstein S. LAO^*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 2001,129(1-2): 35-62.
3Bonet B, Geffner H. Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proc. of the 18th Int'l Joint Conf. on Artificial Intelligence. Acapulco: Morgan Kaufmann Publishers, 2003. 1233-1238.
4Dean T, Kaelbling LP, Kirman J, Nicholson A. Planning under time constraints in stochastic domains. Artificial Intelligence, 1995, 76(1-2):35-74.
5Ferguson D, Stentz A. 2004. Focused dynamic programming: Extensive comparative results, Technical Report, CMU-RI-TR-04-13, Pittsburgh: Robotics Institute, Carnegie Mellon University, 2004.
6Barto AG, Bradtke SJ, Singh SP. Learning to act using real-time dynamic programming. Artificial Intelligence, 1995,72(1-2): 81-138.
7Pemberton JC, Korf RE. Incremental search algorithms for real-time decision making. In: Proc. of the 2nd Artificial Intelligence Planning Systems Conf. 1994. 140-145.
8Bonet B, Geffner H. Labeled RTDP: Improving the convergence of real-time dynamic programming. In: Giunchiglia E, Muscettola N, Nau D, eds. Proc. of the ICAPS 2003. AAAI Press, 2003. 12-21.
9McMahan HB, Likhachev M, Gordon GJ. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proc. of the 22nd Int'l Conf. on Machine learning. 2005.
10Smith T, Simmons R. Focused real-time dynamic programming for MDPs: Squeezing More Out of a Heuristic. In: Proc. of the 21 st AAAI Conf. on Artificial Intelligence. AAAI Press, 2006.

同被引文献69

1谢恺,韩裕生,薛模根,周一宇,安玮.天基红外低轨星座的传感器管理方法[J].宇航学报,2007,28(5):1331-1336. 被引量：11
2李晓林,王建华,廖作文.一种改进的Apriori算法[J].软件导刊,2010,9(1):55-57. 被引量：5
3黄林梅,张桂林,王新余.基于动态规划的红外运动小目标检测算法[J].红外与激光工程,2004,33(3):303-306. 被引量：19
4李建中,张冬冬.滑动窗口规模的动态调整算法[J].软件学报,2004,15(12):1800-1814. 被引量：22
5吴伟斌,肖强,陈联忠,蒲卫,李小华.电子病历系统的设计与实现[J].解放军医院管理杂志,2005,12(3):223-225. 被引量：26
6石磊,张涛,李骏,安玮.空间预警信息处理仿真系统研究[J].计算机仿真,2006,23(2):7-9. 被引量：4
7王海涛,朱洪.改进的二分法查找[J].计算机工程,2006,32(10):60-62. 被引量：37
8Wooldridge M J, Jennings N R. Agent theories, architectures and languages: a survey[C]. In: ECAI94 Workshop on Agent Theories Architectures and Languages, Amsterdam, The Netherlands, 1 - 32.
9Ellen Germain. Software's special agents: tired of sifting through electronic mail, searching databases and scanning networks for interesting news.'? An intelligent agent could be what you need[ J]. New Scientist, 1994, 142(1920) : 19-20.
10Craig Boutilier, Thomas Dean, Steve Hanks. Decision-theoretic planning: structural assumptions and computational leverage [ J]. The Journal of Artificial Intelligence Research, 1999,11 : 1-94.

引证文献8

1石轲,陈小平.行动驱动的马尔可夫决策过程及在RoboCup中的应用[J].小型微型计算机系统,2011,32(3):511-515. 被引量：2
2陈曦,邓勇,王春明,胡晓惠.红外低轨星座多目标传感器调度研究[J].计算机仿真,2011,28(4):43-47. 被引量：3
3薛山花,黄勇,尹力.基于动态规划的微弱信号线谱增强研究[J].应用声学,2011,30(3):193-201. 被引量：3
4叶三星,高伟,古富强,李维良.基于插值预测的快速查找算法[J].软件导刊,2011,10(11):63-65. 被引量：2
5万润泽,朱彦松.从动态规划算法的应用谈算法设计的教学[J].湖北第二师范学院学报,2012,29(8):124-126. 被引量：2
6万小娜,陈盛双,张卓.多目标动态规划在电子病历结构化的应用[J].计算机工程与应用,2012,48(35):218-223. 被引量：2
7陈荣亚,陈小平.多智能体分层协作规划及在RoboCup中的应用[J].计算机系统应用,2016,25(1):17-23. 被引量：3
8周伟江,徐加驹,张薇.基于马尔科夫模型的海战场电磁环境构建效能评估[J].现代雷达,2022,44(11):56-60. 被引量：2

二级引证文献19

1吕聪颖.动态规划法构建最优二叉查找树的研究[J].计算机与现代化,2012(4):145-147. 被引量：1
2王保胜,吕聪颖,吕贯廷,马艳阳.动态规划法求解加工顺序问题的研究[J].计算机与现代化,2012(5):4-6.
3王春燕,徐刚,孙辉,李旭奎.基于环节控制模式规范记录口腔局部麻醉方案[J].中国医学教育技术,2013,27(5):573-576.
4简平,邹鹏,熊伟.基于DPSO-SA的低轨预警系统初始任务规划方法[J].北京航空航天大学学报,2013,39(10):1381-1386. 被引量：3
5李学俊,陈士洋.RoboCup仿真2D实验平台[J].实验室研究与探索,2014,33(4):58-61. 被引量：3
6陈荣亚,陈小平.多智能体分层协作规划及在RoboCup中的应用[J].计算机系统应用,2016,25(1):17-23. 被引量：3
7杨彬彬,单梁,戚志东,吕璐,钱建新.分区PID控制参数的复合整定方法[J].自动化与仪表,2016,31(7):44-49. 被引量：1
8樊谦,杨闽松,严元咏.多智能体系统的鲁棒故障估计观测器的设计[J].计算机测量与控制,2018,26(5):153-157.
9吕丹,杨子寒,周君.动态规划算法在生活中的应用[J].电脑知识与技术,2018,14(6Z):253-255. 被引量：3
10杨新静,张艳兵.分层-协作带教在ICU研究生临床教学中的效果[J].中国继续医学教育,2019,11(17):31-33. 被引量：6

1吴红英,魏利胜.短时延非线性网络控制系统PD型迭代跟踪控制[J].兰州大学学报（自然科学版）,2010,46(3):125-130. 被引量：1
2王丽芳,曾建潮.以模拟退火算法为收敛判据的混合微粒群算法[J].计算机工程与科学,2006,28(5):77-79. 被引量：4
3BIOS设置经验与常见问题处理[J].电脑编程技巧与维护,2001(12):89-90.
4张贺,胡越黎,王权,燕明.基于改进D*算法的移动机器人路径规划[J].工业控制计算机,2016,29(11):73-74. 被引量：14
5WEI Qing-Lai,ZHANG Hua-Guang,LIU De-Rong,ZHAO Yan.An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming[J].自动化学报,2010,36(1):121-129. 被引量：17
6孙圣力,郑志高,王平,刘京.RTDP系统网络体系结构及其关键技术[J].北京邮电大学学报,2014,37(S1):1-7. 被引量：3
7朱旭,闫建国,屈耀红.高阶多智能体系统的一致性分析[J].电子学报,2012,40(12):2466-2471. 被引量：5
8王锴.多平台智能监控：TT SMART DPS-G 700W电源[J].微型计算机,2016,0(12):53-53.
9蒋峰.DOS 内部命令和外部命令的区别[J].上海工会管理职业学院学报（工会理论研究）,1995,0(1):49-49.
10朱明超,贾宏光.基于Paden-Kahan子问题求解滚仰式导引头角增量[J].光学精密工程,2011,19(8):1838-1844. 被引量：8

软件学报

2008年第11期

浏览历史

内容加载中请稍等...

实时动态规划的最优行动判据及算法改进被引量：8

参考文献14

同被引文献69

引证文献8

二级引证文献19

相关作者

相关机构

相关主题

浏览历史

实时动态规划的最优行动判据及算法改进 被引量：8

参考文献14

同被引文献69

引证文献8

二级引证文献19

相关作者

相关机构

相关主题

浏览历史

实时动态规划的最优行动判据及算法改进被引量：8