摘要
针对强化学习在应用中经常出现的"维数灾"问题,即状态空间的大小随着特征数量的增加而发生指数级的增长,以及收敛速度过慢的问题,提出了一种基于启发式奖赏函数的分层强化学习方法.该方法不仅能够大幅度减少环境状态空间,还能加快学习的收敛速度.将此算法应用到俄罗斯方块的仿真平台中,通过对实验中的参数进行设置及对算法性能进行分析,结果表明:采用启发式奖赏函数的分层强化学习方法能在一定程度上解决"维数灾"问题,并具有很好的收敛速度.
Reinforcement learning is about controlling an autonomous agent in an unknown enviroment—often called the state space. The agent has no prior knowledge about the environment and can only obtain some knowledge by acting in the environment. Reinforcement learning, and Q-learning particularly, encounters a major problem. Learning the Q-function in tablular form may be infeasible because the amount of memory needed to store the table is excessive, and the Q-function converges only after each state being visited a lot of times. So "curse of dimensionality" is inevitably produced by large state spaces. A hierarchical reinforcement learning method based on heuristic reward function is proposed to solve the problem of "curse of dimensionality", which make the states space grow exponentially by the number of features and slow down the convergence speed. The method can reduce state spaces greatly and quicken the speed of the study. Actions are chosen with favorable purpose and efficiency so as to optimize the reward function and quicken the convergence speed. The Tetris game is applied in the method. Analysis of algorithms and the experiment result show that the method can partly solve the "curse of dimensionality" and quicken the convergence speed prominently.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2011年第12期2352-2358,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目(60873116
61070223
61070122)
江苏省自然科学基金项目(BK2008161
BK2009116)
江苏省高校自然科学研究基金项目(09KJA520002)
江苏省现代企业信息化应用支撑软件工程技术研究开发中心基金项目(SX200804)
关键词
分层强化学习
试错
启发式奖赏函数
俄罗斯方块
“维数灾”
hierarchical reinforcement learning
trial-and-error
heuristic reward function
Tetris
curse of dimensionality