期刊文献+

集成规划的行动-自适应评价强化学习算法

An Integrating Planning Actor-Adaptive Critic Reinforcement Learning Algorithm
下载PDF
导出
摘要 研究了行动-自适应评价强化学习方法,考虑到行动器所采取的探索策略对学习性能的影响,利用混合探索策略进行探索;分析学习和规划的特点,在评价器中应用集成模型学习和无模型学习的学习方法,在行动器应用新的混合探索策略,提出一种集成规划的行动-自适应评价强化学习算法并进行仿真实验,实验结果表明,新算法有较好的学习效果. Actor-critic reinforcement learning methods are investigated. Considering the performance of the Actor learning,a hybrid exploration strategy is used to explore the environment. The merit of learning and planning is analyzed,and a learning method for integrating model learning and model free learning is used in Critic. The actor adopting hybrid exploration strategy and an integrating planning Actor-adaptive Critic reinforcement learning algorithm are proposed. The simulation results for the algorithm have shown that performance is more effective.
出处 《内蒙古大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第3期346-350,共5页 Journal of Inner Mongolia University:Natural Science Edition
基金 广西自然科学基金(桂科自0481016) 广西工学院博士基金(031002) 教育部重点(204031) 内蒙古大学博士基金(203043) 内蒙古大学“513”人才计划(205144)资助
关键词 强化学习 行动 评价 规划 探索策略 reinforcement learning actor critic planning exploration strategy
  • 相关文献

参考文献7

  • 1Singh S. Learning to Solw Markovian Decision Processes [D]. MA:University of Massachusetts, 1994.
  • 2Barto A G,Sutton R S,Anderson C W. Neuron like elements that can solve difficult learning control problems [J]. I EEE Transactions on Systems, Man, and Cybernetics, 1983,13 : 835 - 846.
  • 3Barto A G. Reinforcement learning and adaptive critic methods [A]. IN. White D A, Sofge D A. Handbook of Intelligent Control:Neural, Fuzzy, and Adaptive Approaches [C]. NY: Van Nostrand Reinhold, 1992. 469 -491.
  • 4李春贵,刘永信,陈波.多步截断行动—评价强化学习算法[J].内蒙古大学学报(自然科学版),2005,36(2):210-213. 被引量:1
  • 5李春贵,吴沧浦,刘永信.一种集成规划的SARSA(λ)强化学习算法[J].北京理工大学学报,2002,22(3):325-327. 被引量:2
  • 6Koening S, Simmons R G. The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms [J]. Machine Learning, 1996,22 : 228 - 250.
  • 7Moore A W,Atkeson C G. Prioritized sweeping: Reinforcement learning with less data and less real time [J]. Machine Learning, 1993,13 : 103 - 130.

二级参考文献11

  • 1胡光华.平均报酬准则强化学习方法研究[M].北京:北京理工大学自动控制系,1999..
  • 2Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey [J]. Journal of Artificial Intelligence Research, 1996,4 : 237 - 285.
  • 3Duff M O. Q-learning for bandit problems [M]. CA:San Francisco, Morgan Kaufmann, Proceedings of the Twelfth International Conference on Machine Learning, 1995. 209-217. 1995.
  • 4Singh S. Learning to Solve Markovian Decision Processes [D]. MA: University of Massachusetts, 1994.
  • 5Witten I H. An adaptive optimal controller for discrete-time Markov environments [J]. Information and Control, 1977,34: 286 - 295.
  • 6Barto A G,Sutton R S,Anderson C W. Neuron like elements that can solve difficult learning control problems[J]. IEEE Transactions on Systems ,Man, and Cybernetics, 1983,13 : 835 -846.
  • 7Barto A G. Reinforcement learning and adaptive critic methods [A]. Handbook of Intelligent Control: Neural,Fuzzy, and Adaptive Approaches [C]. Van Nostrand Reinhold, 1992. 469-491.
  • 8Barto A G. Adaptive critics and the basal ganglia [M]. Models of Information Processing in the Basal Ganglia MA: The MIT Press, 1995. 215-232.
  • 9Sutton R S,Barto A G. Reinforcement Learning, An Introduction [M]. MA:The MIT Press, 1998.
  • 10李春贵,刘永信.一种状态集结因子化SARSA(λ)强化学习算法[J].内蒙古大学学报(自然科学版),2001,32(6):675-678. 被引量:1

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部