期刊文献+

Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量:13

Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations
下载PDF
导出
摘要 In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
出处 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页 自动化学报(英文版)
关键词 REINFORCEMENT learning dynamic programming Markovian decision problems AGGREGATION feature-based ARCHITECTURES policy ITERATION DEEP neural networks rollout algorithms Reinforcement learning dynamic programming Markovian decision problems aggregation feature-based architectures policy iteration deep neural networks rollout algorithms
  • 相关文献

同被引文献48

引证文献13

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部