摘要
语言类强化学习任务可以促进强化学习策略的泛化性,其关键问题是自动化学习观测和语言描述的通用表示。现有方法往往隐式学习联合表示,不可避免地引入训练集中的虚假相关信息,进而损伤策略的泛化性和训练效率。针对这一问题,本文提出了概念化强化学习框架(CRL),其利用概念化这种从实体提取相似性生成抽象表示的认知方式,通过基于注意力机制的概念编码器和限制性损失函数显式地学习概括且抽象的概念化表示作为强化学习策略的输入。本文在常用的语言条件任务和文本游戏任务上验证了CRL的有效性,结果显示概念化表示大幅提升了策略的训练效率(最多70%)和泛化性能(最多30%),并有效提升了策略的可解释性。
Language-assisted tasks are proposed to facilitate the generalization ability of reinforcement learning policy. The key question is to learn the general representation across different scenarios. Existing studies often implicitly learn the joint representation, which may include spurious correlation information and consequently compromise pol-icy’s generalization performance and training efficiency. To address this issue, a conceptual reinforcement learning framework (CRL) is proposed, which exploits the motivation of human cognition that extracts similarits from nu-merous instances to generate conceptual abstraction, and incorporates a multi-level attention encoder and restricted loss functions to learn compact and invariant conceptual representation for the policy. Evaluated in challenging lan-guage- assisted tasks, the results demonstrate that CRL significantly improves the policy’s training efficiency (up to 70%) and generalization ability (up to 30%). Additionally, the conceptual representation also shows better inter-pretability than other representations.
作者
彭少辉
胡杏
支天
PENG Shaohui;HU Xing;ZHI Tian(State Key Lab of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;Cambricon Technologies,Beijing 100080)
出处
《高技术通讯》
CAS
北大核心
2024年第6期555-566,共12页
Chinese High Technology Letters
基金
国家自然科学基金(62002338,U20A20227,U22A2028)
中国科学院稳定支持基础研究领域青年团队计划(YSBR-029)资助项目。
关键词
深度强化学习(DRL)
语言类强化学习任务
文本游戏
表示学习
互信息优化
deep reinforcement learning(DRL)
language-assisted reinforcement learning task
text game
representation learning
mutual information