摘要
强化学习是人工智能领域中的一个研究热点。在求解强化学习问题时,传统的最小二乘法作为一类特殊的函数逼近学习方法,具有收敛速度快、充分利用样本数据的优势。通过对最小二乘时序差分算法(Least-Squares Temporal Difference,LSTD)的研究与分析,并以该方法为基础提出了双权重最小二乘Sarsa算法(Double Weights With Least Squares Sarsa,DWLS-Sarsa)。DWLS-Sarsa算法将两权重通过一定方式进行关联得到目标权重,并利用Sarsa方法对时序差分误差进行控制。在算法训练过程中,两权重会因为更新样本的不同而产生不同的值,保证了算法可以有效地进行探索;两权重也会因为样本数据的分布而逐渐缩小之间的差距直到收敛至同一最优值,确保了算法的收敛性能。最后将DWLS-Sarsa算法与其他强化学习算法进行实验对比,结果表明DWLS-Sarsa算法具有较优的学习性能与鲁棒性,可以有效地处理局部最优问题并提高算法收敛时的表现效果。
Reinforcement Learning is one of the most challenging and difficult concerns in the field of artificial intelligence.Least-squares method is one of the advanced function approximate methods that can be used to solve the problem of reinforcement learning.It has advantages of fast convergence rate and sufficient utilization of sample data.After the study and analysis of least squares temporal diffe-rence algorithm(LSTD),this paper proposes a double weights with least-squares Sarsa algorithm(DWLS-Sarsa)based on the LSTD algorithm.DWLS-Sarsa combines two weights in a certain way and takes control of temporal diffe-rence error with Sarsa methods.During the training process,two weights will produce different values because of the difference in the updated samples and will gradually narrow the gap between the two weights until they converge to the same optimal value duo to the distribution of the sample data.So that the exploration performance and convergence of the algorithm will be ensured.Finally,DWLS-Sarsa algorithm is applied to the experiment and compared with other reinforcement learning algorithms.The experimental results show that DWLS-Sarsa algorithm can deal with local optimum problems effectively to achieve more precise convergence value and has better learning performance and robustness.
作者
李斌
刘全
LI Bin;LIU Quan(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China;Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006,China;Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry Education,Jilin University,Changchun 130012,China)
出处
《计算机科学》
CSCD
北大核心
2020年第12期210-217,共8页
Computer Science
基金
国家自然科学基金(61772355,61702055,61502323,61502329)
江苏省高等学校自然科学研究重大项目(18KJA520011,17KJA520004)
吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04,93K172017K18)
苏州市应用基础研究计划工业部分(SYG201422)。
关键词
强化学习
函数逼近
最小二乘
时序差分
Sarsa
Reinforcement learning
Function approximation
Least-squares
Temporal difference
Sarsa