基于非对称不可观测状态的强化学习技术

Reinforcement learning technology based on asymmetric unobservable state

下载PDF

导出

摘要真实动态博弈场景下对抗双方存在信息不对等、工作机理和规则不相同等特征,但现有的强化学习算法通过假设状态可观测或部分可观测来采用近似模型拟合。因此,在难以准确获取或者无法获取对方状态信息时,假设条件难以成立,导致现有强化学习模型无法直接适用。针对这个问题,提出一种基于非对称不可观测强化学习新框架,在该框架下,智能体仅根据价值反馈即可实现在线学习。为验证可行性和通用性,将3种典型强化学习算法移植到该算法框架,搭建了博弈对抗模型,进行对比验证。结果表明,3种算法都可成功应用于不可观测状态的动态博弈环境,且收敛速度大幅提高,证明了该框架的可行性和通用性。 In real dynamic game scenarios,there are characteristics such as unequal information,various working mechanisms,and different rules between adversaries.However,the existing reinforcement learning algorithms use approximate model fitting by assuming that the state is fully observable or partially observable.Therefore,it is hard to establish assumptions when it is hard to accurately obtain or unable to obtain the status information of the other party,which result in existing reinforcement learning models that cannot be directly applied.To solve this problem,a new framework based on asymmetric unobservable reinforcement learning is proposed.Under this framework,agents can achieve online learning only based on value feedback.In order to verify the feasibility and versatility of the proposed framework,three typical reinforcement learning algorithms are transplanted into the proposed algorithm framework,and a game confrontation model is built for comparative verification.The results show that the three algorithms can be successfully applied to dynamic game environments with unobservable states,and the convergence speed is greatly improved,which proves the feasibility and versatility of the proposed framework.

作者李欣致董胜波崔向阳 LI Xinzhi;DONG Shengbo;CUI Xiangyang(Beijing Institute of Remote Sensing Equipment,Beijing 100854,China;State Key Laboratory of Communication Content Cognition,Beijing 100733,China)

机构地区北京遥感设备研究所传播内容认知国家重点实验室

出处《系统工程与电子技术》 EI CSCD 北大核心 2023年第6期1755-1761,共7页 Systems Engineering and Electronics

关键词强化学习动态博弈非对称不可观测状态 reinforcement learning dynamic game asymmetric unobservable state

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1汪曲,许愉.何以驱动基层公务员担当作为:基于扎根理论的质性研究[J].公共管理与政策评论,2022,11(5):42-60. 被引量：13
2陈辉辉.浅议军人社会心态模型构建的三个问题[J].产城（上半月）,2021(9):215-216.
3全欢,彭显刚,刘涵予,周鹏,吴泽霖,苏浩坤.基于深度强化学习的配电网实时电压优化控制方法[J].电网技术,2023,47(5):2029-2038. 被引量：9
4李子仡,饶志强,常惠,李益晨,丁璐,方建军.基于生成对抗网络的隧道裂缝自动分割算法研究[J].铁道学报,2023,45(5):136-142. 被引量：1
5林荣.基于多重注意力机制的自然图像抠图[J].电脑编程技巧与维护,2023(4):144-146. 被引量：1
6崔凡,强继朋,朱毅,李云.基于ChineseBert的中文拼写纠错方法[J].南京大学学报（自然科学版）,2023,59(2):302-312. 被引量：1
7化仁宽,奚勇,张莫楠,陈光山,赵长春.基于全局信息的非线性微分对策协同制导律[J].长春理工大学学报（自然科学版）,2023,46(1):81-87. 被引量：1
8郑泽新,李伟,邹鲲,李艳福.基于强化学习的对空雷达抗干扰波形设计[J].兵工学报,2023,44(5):1422-1430. 被引量：1
9金杨华,施荣荣,吴波,王节祥.产业集群赋能平台从何而来:功能开发与信任构建共演的视角[J].管理世界,2023,39(5):127-144. 被引量：19
10周琰,马强.欺骗诱捕技术在气象网络安全攻防对抗场景下的应用[J].气象科技,2023,51(2):208-214. 被引量：6

系统工程与电子技术

2023年第6期

浏览历史

内容加载中请稍等...

基于非对称不可观测状态的强化学习技术

相关作者

相关机构

相关主题

浏览历史