摘要
研究了行动-自适应评价强化学习方法,考虑到行动器所采取的探索策略对学习性能的影响,利用混合探索策略进行探索;分析学习和规划的特点,在评价器中应用集成模型学习和无模型学习的学习方法,在行动器应用新的混合探索策略,提出一种集成规划的行动-自适应评价强化学习算法并进行仿真实验,实验结果表明,新算法有较好的学习效果.
Actor-critic reinforcement learning methods are investigated. Considering the performance of the Actor learning,a hybrid exploration strategy is used to explore the environment. The merit of learning and planning is analyzed,and a learning method for integrating model learning and model free learning is used in Critic. The actor adopting hybrid exploration strategy and an integrating planning Actor-adaptive Critic reinforcement learning algorithm are proposed. The simulation results for the algorithm have shown that performance is more effective.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第3期346-350,共5页
Journal of Inner Mongolia University:Natural Science Edition
基金
广西自然科学基金(桂科自0481016)
广西工学院博士基金(031002)
教育部重点(204031)
内蒙古大学博士基金(203043)
内蒙古大学“513”人才计划(205144)资助
关键词
强化学习
行动
评价
规划
探索策略
reinforcement learning
actor
critic
planning
exploration strategy