期刊文献+

基于自适应不确定性度量的离线强化学习算法

Adaptive uncertainty quantification for model-based offline reinforcement learning
下载PDF
导出
摘要 离线强化学习可以从历史经验数据中直接学习出可执行的策略,由此来避免与在线环境的高代价交互,可应用于机器人控制、无人驾驶、智能营销等多种真实场景。有模型的离线强化学习首先通过监督学习构造环境模型,并通过与该环境模型交互来优化学习策略,具有样本效率高的特点,是最常用的离线强化学习算法。然而,由于离线数据集存在分布偏移问题,现有的方法往往通过静态的方法来评估此种不确定性,无法动态自适应于智能体策略的优化过程。针对以上问题,提出一种自适应的不确定性度量方法,首先对状态的不确定性进行估计,然后通过动态自适应的方法来衡量环境模型的不确定性,从而使得智能体可以在探索-保守中取得更好的平衡。在多个基准的离线数据集对算法进行了验证,实验结果表明,该算法在多个数据集中都取得最好的效果,消融实验等也验证了所提方法的有效性。 Offline reinforcement learning(RL)can optimize agent policies directly from historical offline datasets,avoiding the risky interactions with online environment.It is promising to be used in robot manipulation,autonomous driving,intelligent recommendation,etc.Model-based offline RL starts from constructing a supervised environmental model,and then interacts with this model to optimize the policy.This approach has high sample efficiency and has been widely considered in related studies.However,the distributional shift between the offline dataset and the online environment can also lead to out-of-distribution problem.Current methods mainly considered static metrics to measure the uncertainty from the environment model,and cannot adapt to the dynamic policy optimization process.Targeting the above problem,we propose a novel adaptive uncertainty quantification method.This method estimates the uncertainty of each state,and then uses the dynamic weight for the uncertainty quantification.Thus a better trade-off can be achieved between the conservatism and radicalism.Evaluations on multiple benchmarks validate the effectiveness of the algorithm.Ablation studies also demonstrate the usefulness of the measurements.
作者 张伯雷 刘哲闰 ZHANG Bolei;LIU Zherun(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处 《南京邮电大学学报(自然科学版)》 北大核心 2024年第4期98-104,共7页 Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金 国家自然科学基金(62202238)资助项目。
关键词 离线强化学习 环境模型 自适应权重 不确定性度量 offline reinforcement learning(RL) environment model adaptive weight uncertainty quantification
  • 相关文献

参考文献1

二级参考文献8

共引文献472

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部