离线数据强化学习:途径与进展

Reinforcement Learning from Offline Data: Approaches and Advances

导出

摘要经过几十年的发展,强化学习技术在游戏场景以及围棋等复杂大型博弈决策中取得了突破性进展。虽然期待强化学习技术也能在现实任务中帮助求解最优决策,但现实任务的开放性与游戏世界的封闭性有本质区别,跨越游戏和现实世界之间边界的关键技术仍然缺失。从监督学习技术的成功应用中可以看出,基于历史数据的学习范式已经被广泛接受。因此,从离线数据(历史数据)中学习最优决策是在现实世界应用强化学习的最有潜力的技术途径之一。本文总结了离线强化学习的可能途径,并介绍了相关进展。希望通过对离线强化学习框架的梳理,推动更多该领域的研究工作,促进利用强化学习技术解决更多生产生活中的决策难题。 After decades of development,reinforcement learning technology has achieved great success in games such as Atari and the game of Go.While we expect that reinforcement learning technology can also help achieve optimal decision-making in real-world tasks,the essential difference between the game world and the real world blocks the application of reinforcement learning.The key to breaking the boundary between the two worlds has not been well established.From the successful supervised learning applications,we can see that the learning paradigm based on historical data has been widely accepted.Therefore,reinforcement learning from offline data has a great potential to be applicable in the real world.This paper surveys the approaches of reinforcement learning from offline data and introduces the related advances.It is hoped that through outlining the paths of offline reinforcement learning,the development of the field will be stimulated and the real-world decisionmaking problems in industrial tasks and our everyday life will be solved.

作者俞扬 YU Yang(State Key Laboratory of Novel Software Technology,Nanjing University,Nanjing 210023)

机构地区南京大学软件新技术国家重点实验室

出处《中国基础科学》 2022年第3期35-39,46,共6页 China Basic Science

关键词强化学习离线强化学习环境模型学习 reinforcement learning offline reinforcement learning world-model learning

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1俞子荣.虹桥论坛促进中国进一步开放[J].中国外资,2023(1):26-27.
2焦兴涛.随感[J].当代美术家,2005(5):24-25.
3德吉卡卓,张江燕,群诺.藏文词向量方法研究综述[J].信息与电脑,2022,34(17):59-62.
4陈永生.做好建筑企业政工工作的路径探寻[J].乡镇企业导报,2019(12):11-12.
5吴永飞,王彦博,解立伟,杨璇,刘曦子,徐奇,宫雅菲,何姗,巨春武.基于数字孪生的医保宏观决策深度强化智能应用研究[J].财务管理研究,2022(4):94-99.
6刘岩.数字孪生技术在钢铁企业中的应用研究[J].天津冶金,2022(6):59-62. 被引量：2
7严珏.统编版小学语文现代诗教学方法探析——以四年级下册《绿》为例[J].新校园,2022(12):42-44. 被引量：1
8徐彬.移民、杂合与互文——从《贝奥武夫》看早期英国民族共同体的诞生[J].复旦外国语言文学论丛,2022(1):27-34.

中国基础科学

2022年第3期

浏览历史

内容加载中请稍等...

离线数据强化学习:途径与进展

相关作者

相关机构

相关主题

浏览历史