期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Hierarchical Reinforcement Learning With Automatic Sub-Goal Identification 被引量:1
1
作者 Chenghao Liu Fei Zhu +1 位作者 Quan Liu yuchen fu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第10期1686-1696,共11页
In reinforcement learning an agent may explore ineffectively when dealing with sparse reward tasks where finding a reward point is difficult.To solve the problem,we propose an algorithm called hierarchical deep reinfo... In reinforcement learning an agent may explore ineffectively when dealing with sparse reward tasks where finding a reward point is difficult.To solve the problem,we propose an algorithm called hierarchical deep reinforcement learning with automatic sub-goal identification via computer vision(HADS)which takes advantage of hierarchical reinforcement learning to alleviate the sparse reward problem and improve efficiency of exploration by utilizing a sub-goal mechanism.HADS uses a computer vision method to identify sub-goals automatically for hierarchical deep reinforcement learning.Due to the fact that not all sub-goal points are reachable,a mechanism is proposed to remove unreachable sub-goal points so as to further improve the performance of the algorithm.HADS involves contour recognition to identify sub-goals from the state image where some salient states in the state image may be recognized as sub-goals,while those that are not will be removed based on prior knowledge.Our experiments verified the effect of the algorithm. 展开更多
关键词 Hierarchical control hierarchical reinforcement learning OPTION sparse reward sub-goal
下载PDF
Experience Replay for Least-Squares Policy Iteration 被引量:1
2
作者 Quan Liu Xin Zhou +2 位作者 Fei Zhu Qiming fu yuchen fu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI 2014年第3期274-281,共8页
Policy iteration,which evaluates and improves the control policy iteratively,is a reinforcement learning method.Policy evaluation with the least-squares method can draw more useful information from the empirical data ... Policy iteration,which evaluates and improves the control policy iteratively,is a reinforcement learning method.Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validity.However,most existing online least-squares policy iteration methods only use each sample just once,resulting in the low utilization rate.With the goal of improving the utilization efficiency,we propose an experience replay for least-squares policy iteration(ERLSPI)and prove its convergence.ERLSPI method combines online least-squares policy iteration method with experience replay,stores the samples which are generated online,and reuses these samples with least-squares method to update the control policy.We apply the ERLSPI method for the inverted pendulum system,a typical benchmark testing.The experimental results show that the method can effectively take advantage of the previous experience and knowledge,improve the empirical utilization efficiency,and accelerate the convergence speed. 展开更多
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部