摘要
由于状态-动作空间的扩大或奖励回报稀疏,强化学习智能体在复杂环境下从零开始学习最优策略将更为困难。由此提出基于智能体认知行为模型的启发加速深度Q网络,将符号化的规则表示融入学习网络,动态引导智能体策略学习,解决有效加速智能体学习的问题。该算法将启发知识建模为基于BDI(Belief-Desire-Intention)的认知行为模型,用于产生认知行为知识引导智能体策略学习,设计启发策略网络在线引导智能体的动作选择。GYM典型环境与星际争霸2环境下实验表明,该算法可以根据环境变化动态提取有效的认知行为知识,并借助启发策略网络加速智能体策略收敛。
Due to the expansion of the state-action space or sparse rewards of the complex environment,it is more difficult for reinforcement learning agents to learn an optimal policy from scratch.Therefore,a cognitive behavior model-based heuristic accelerated deep Q network is proposed.It incorporated symbolic rules into the learning network and guided policy learning dynamically,which solved the problem of effectively accelerating agents learning.The algorithm modeled the heuristic knowledge as a BDI-based cognitive behavior model,which was used to generate cognitive behavior knowledge to guide the agents'strategy learning.The heuristic strategy network was designed to guide the agent's action selection online.Experiments in GYM's typical environment and StarCraft II environment show that the algorithm can dynamically extract effective cognitive behavior knowledge according to environmental changes,and accelerate the agent strategy convergence with the help of heuristic strategy network.
作者
李嘉祥
陈浩
黄健
张中杰
Li Jiaxiang;Chen Hao;Huang Jian;Zhang Zhongjie(College of Artificial Intelligence,National University of Defense Technology,Changsha 410073,Hunan,China)
出处
《计算机应用与软件》
北大核心
2024年第9期148-155,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61906202)。
关键词
强化学习
认知行为模型
启发加速深度Q网络
Reinforcement learning
Cognitive behavior model
Heuristic accelerated deep Q network