摘要
经过几十年的发展,强化学习技术在游戏场景以及围棋等复杂大型博弈决策中取得了突破性进展。虽然期待强化学习技术也能在现实任务中帮助求解最优决策,但现实任务的开放性与游戏世界的封闭性有本质区别,跨越游戏和现实世界之间边界的关键技术仍然缺失。从监督学习技术的成功应用中可以看出,基于历史数据的学习范式已经被广泛接受。因此,从离线数据(历史数据)中学习最优决策是在现实世界应用强化学习的最有潜力的技术途径之一。本文总结了离线强化学习的可能途径,并介绍了相关进展。希望通过对离线强化学习框架的梳理,推动更多该领域的研究工作,促进利用强化学习技术解决更多生产生活中的决策难题。
After decades of development,reinforcement learning technology has achieved great success in games such as Atari and the game of Go.While we expect that reinforcement learning technology can also help achieve optimal decision-making in real-world tasks,the essential difference between the game world and the real world blocks the application of reinforcement learning.The key to breaking the boundary between the two worlds has not been well established.From the successful supervised learning applications,we can see that the learning paradigm based on historical data has been widely accepted.Therefore,reinforcement learning from offline data has a great potential to be applicable in the real world.This paper surveys the approaches of reinforcement learning from offline data and introduces the related advances.It is hoped that through outlining the paths of offline reinforcement learning,the development of the field will be stimulated and the real-world decisionmaking problems in industrial tasks and our everyday life will be solved.
作者
俞扬
YU Yang(State Key Laboratory of Novel Software Technology,Nanjing University,Nanjing 210023)
出处
《中国基础科学》
2022年第3期35-39,46,共6页
China Basic Science
关键词
强化学习
离线强化学习
环境模型学习
reinforcement learning
offline reinforcement learning
world-model learning