Greedy feature replacement for online value function approximation

Greedy feature replacement for online value function approximation

导出

摘要 Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems. Reinforcement learning （RL） in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement （GFR）, a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference （TD） error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.

作者 Feng-fei ZHAO Zheng QIN Zhuo SHAO Jun FANG Bo-yan REN

机构地区 Department of Computer Science and Technology School of Software Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics

出处《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2014年第3期223-231,共9页 浙江大学学报C辑（计算机与电子（英文版）

基金 Project supported by the 12th Five-Year Defense Exploration Project of China(No.041202005) the Ph.D.Program Foundation of the Ministry of Education of China(No.20120002130007)

关键词 Reinforcement learning Function approximation Feature dependency Online expansion Feature replacement Reinforcement learning, Function approximation, Feature dependency, Online expansion, Feature replacement

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献22

1Pazis, J., Lagoudakis, M.G., 2009. Binary action search for learning continuous-action control policies. Proc. 26th Annual Int. Conf. on Machine Learning, p.793-800. [doi:10.1145/1553374.1553476].
2Singh, S.E, Yee, R.C., 1994. An upper bound on the loss from approximate optimal-value functions. Maeh. Learn., 16(3):227-233. [doi: 10.1007/Bf00993308].
3Singh, S., Jaakkola, T., Littman, M.L., et al., 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Maeh. Learn., 38(3):287-308. [doi: 10.1023/A 1007678930559].
4Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3-4):279-292. [doi: 10.1007/Bf00992698].
5Kaelbling, L.E, Littman, M.L., Moore, A:V, 1996. Reinforcement learning: a survey. J. Artif. Intell. Res., 4:237-285. [doi: 10.1613/jair.301 ].
6Buro, M., 1999. From simple features to sophisticated evaluation functions. Proc. 1 st Int. Conf. on Computers and Games, p.126-145. [doi:10.1007/3-540-48957-6_8].
7Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA, p.3-25.
8Albus, J.S., 1971. A theory of cerebellar function. Math. Biosci., 10(1-2):25-61. [doi:10.1016/0025-5564(71)900 51-4].
9Puterman, M.L., 1994. Markov Decision Processes Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY, p.139-161.
10Tsitsiklis, J.N., van Roy, B., 1997. An analysis of temporal- difference learning with function approximation. IEEE Trans. Automat. Contr., 42(5):674-690. [doi: 10.1109/9. 580874].

1Fujitsu笔记本电脑冬日大军杀到[J].科学时代,2004(11X):102-103.
2探索BIOS.教你如何热插拔[J].电脑迷,2004,0(2):30-30.
3张徽燕,张胡.动态Cache技术在网站性能设计中的应用[J].电脑知识与技术,2006(12):77-77. 被引量：1
4李涓子,王作英.Chinese Statistical Parser Based on Semantic Dependencies[J].Tsinghua Science and Technology,2002,7(6):591-595. 被引量：1
5在云中复制和恢复数据[J].办公自动化,2013,0(5):37-37.
6李东,叶友,杨小鹏,刘振宇.基于增量聚类的语义缓存替换策略[J].计算机应用研究,2008,25(12):3610-3613. 被引量：2
7Md. Baharul Islam,Mir Md. Jahangir Kabir.A New Feature-Based Image Registration Algorithm[J].Computer Technology and Application,2013,4(2):79-84.
8张化祥,黄上腾.Multiagent reinforcement learning through merging individually learned value functions[J].Journal of Harbin Institute of Technology(New Series),2005,12(3):346-350.
9樊博浩.更美观的任务切换界面[J].玩电脑,2005(5):106-106.
10马争,郝云飞.蓝牙服务搜索的设计与实现[J].电子学报,2003,31(11):1758-1760. 被引量：4

Journal of Zhejiang University-Science C(Computers and Electronics)

2014年第3期

浏览历史

内容加载中请稍等...

Greedy feature replacement for online value function approximation

参考文献22

相关作者

相关机构

相关主题

浏览历史