摘要
针对分层策略梯度强化学习算法(HPGRL)易陷入局部最优点等问题,提出一种分层策略搜索算法(PSO-HPS).首先由设计者按照经典分层强化学习MAXQ方法的思想构建子任务分层结构,通过与环境的直接交互,PSO-HPS利用具有较强全局搜索能力的粒子群对各复合子任务中的参数化策略进行进化,以获得优化的动作策略.最后以协商僵局消解的实验验证PSO-HPS是有效的,其性能明显优于HPGRL.
In order to overcome drawbacks in hierarchical policy gradient reinforcement learning algorithm (HPGRL), such as problem of local optimum, a new algorithm for searching hierarchical policies is proposed, named Hierarchical Policy Search Based on PSO (PSO-HPS). The designers create the task decomposition graph according to the hierarchical theory of MAXQ, one of the classical hierarchical reinforcement learning techniques. Then the hierarchical parameterized policies of all compound subtasks are evolved in process of direct interaction with the environment by utilizing a particle swarm to acquire the optimized action policies. Experimental results demonstrate the algorithm is valid and its performance outperforms that of HPGRL remarkably.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2008年第1期98-103,共6页
Pattern Recognition and Artificial Intelligence
基金
广东省自然科学基金项目资助(No.06029281
05011905)
关键词
分层强化学习
粒子群优化算法(PSO)
分层策略
协商僵局
Hierarchical Reinforcement Learning, Particle Swarm Optimization (PSO),Hierarchical Policies, Negotiation Deadlock