期刊文献+

一种基于PSO的分层策略搜索算法 被引量:1

An Algorithm for Hierarchical Policy Search Based on PSO
原文传递
导出
摘要 针对分层策略梯度强化学习算法(HPGRL)易陷入局部最优点等问题,提出一种分层策略搜索算法(PSO-HPS).首先由设计者按照经典分层强化学习MAXQ方法的思想构建子任务分层结构,通过与环境的直接交互,PSO-HPS利用具有较强全局搜索能力的粒子群对各复合子任务中的参数化策略进行进化,以获得优化的动作策略.最后以协商僵局消解的实验验证PSO-HPS是有效的,其性能明显优于HPGRL. In order to overcome drawbacks in hierarchical policy gradient reinforcement learning algorithm (HPGRL), such as problem of local optimum, a new algorithm for searching hierarchical policies is proposed, named Hierarchical Policy Search Based on PSO (PSO-HPS). The designers create the task decomposition graph according to the hierarchical theory of MAXQ, one of the classical hierarchical reinforcement learning techniques. Then the hierarchical parameterized policies of all compound subtasks are evolved in process of direct interaction with the environment by utilizing a particle swarm to acquire the optimized action policies. Experimental results demonstrate the algorithm is valid and its performance outperforms that of HPGRL remarkably.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2008年第1期98-103,共6页 Pattern Recognition and Artificial Intelligence
基金 广东省自然科学基金项目资助(No.06029281 05011905)
关键词 分层强化学习 粒子群优化算法(PSO) 分层策略 协商僵局 Hierarchical Reinforcement Learning, Particle Swarm Optimization (PSO),Hierarchical Policies, Negotiation Deadlock
  • 相关文献

参考文献4

二级参考文献39

  • 1郭庆,陈纯.基于整合效用的多议题协商优化[J].软件学报,2004,15(5):706-711. 被引量:27
  • 2[1]Bao, G., C. G. Cassandras, T. E. Djaferis,A.D. Gandhi, and D. P. Looze, "Elevator dispatchers for down peak traffic", ECE Department Technical Report, University of Massachusetts, 1994.
  • 3[2]Barto, A. G., S. Mahadevan, "Recent advances in hierarchical reinforcement learning", Discrete Event Dynamic Systems:Theory and Applications, Vol. 13, pp41-77,2003.
  • 4[3]Bradtke, S. J. and M. O. Duff,"Reinforcement learning methods for continuous-time Markov decision problems", Advances in Neural Information Processing Systems 7,Cambridge, MA, 1995.
  • 5[4]Crites, R. H. and A. G. Barto, "Improving elevator performance using reinforcement learning", Advances in Neural Information Processing Systems 8, pp1017-1023, 1996.
  • 6[5]Mahadevan, S., M. Nicholas, D. Tapas. and G. Abhijit, "Self-Improving factory simulation using continuous-time average-reward reinforcement learning",Proceedings of the 14th International Conference on Machine Learning (IMLC ′97), Nashville, TN, 1997.
  • 7[6]Mataric, M., "Reinforcement learning in the multi-robot domain", Autonomous Robots, Vol. 4, No. 1, pp73-83, 1997.
  • 8[7]Parr, R., "Hierarchical control and learning for markov decision processes", Ph.D.dissertation, University of California,Berkeley, CA, 1998.
  • 9[8]Rajbala, M., M. Sridhar, and G.Mohammad, "Hierarchical multi-agent reinforcement learning", Proceedings of the fifth International Conference on Autonomous Agents, pp246-253, 2001.
  • 10[9]Sutton, R.S. and A.G. Barto, Reinforcement Learning: An Introduction, Cambridge,MA: MIT Press, 1998.

共引文献285

同被引文献14

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部