摘要
提出状态探索密度的概念,通过检测状态对智能体探索环境能力的影响来发现学习的子目标并构建对应的 Option.用该算法创建 Option 的再励学习算法能有效提高学习速度.算法具有和任务无关、不需要先验知识等优点,构造出的 Option 在同一环境下不同任务间可以直接共享.
A new method, named exploration density(ED) inspection, is presented. Useful options were discovered by the method through inspecting the influence of the state on agent's explore ability in state space. The simulation results show that the proposed algorithm has better performance in reinforcement learning. The method has characteristics of task-independence, no need of prior knowledge , etc. The created options can be directly shared among different tasks in the same environment.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2007年第2期236-240,共5页
Pattern Recognition and Artificial Intelligence