摘要
为了提高强化学习算法在多智能体系统中的性能表现,针对典型的多智能体系统-Keepaway平台总是以失败告终的特点,受与之有相同特点的单智能体系统杆平衡系统所采用强化函数的启发,重新设计一种新的惩罚式的强化函数。新的强化函数在系统成功状态时设零值奖赏,失败状态时给与负值惩罚。基于新设计的强化函数的Sarsa(λ)算法成功应用在Keepaway平台上。仿真结果表明,新设计的强化函数在一定参数条件下有效提高了强化学习算法载Keepaway平台的性能表现,其最终的学习效果更好。
To improve the performance of the reinforcement learning method on multi-agent systems, thinking about the characteristic of Keepaway that always ended with failure, based on the reference of the reward function design pattern in the pole-balance system, a new punitive reward function is redesigned. The values of the reward function are zeroes when the system is at successful states, and the values are negatives when the system is at failed states. Sarsa(λ) algorithm based on the new reward function are successfully used on the Keepaway. The simulation results show that the new reward function based on some parameters is better, and improves the performance of the reinforcement learning effectively.
出处
《控制工程》
CSCD
北大核心
2009年第2期239-242,共4页
Control Engineering of China
基金
北京市教委科技重点发展基金资助项目(EM200610005019)
北京工业大学博士科研启动基金资助项目(52002011200708)