知识驱动的智能博弈对抗行动序列规划方法

Knowledge Driven Course of Action Planning for Intelligent Game Confrontation

下载PDF

导出

摘要针对基于深度强化学习方法解决实际博弈对抗序列规划问题中存在的探索-利用矛盾、奖赏信号稀疏、数据利用率低、难以稳定收敛等问题,分析了基于知识的学习型智能生成模式,提出基于知识驱动的方法,从用规则教、从数据中学、用问题引导等方面构建了智能博弈对抗行动序列规划模型,为提升探索-利用效率、精准奖励函数、加速算法收敛提供了理论支撑。对基于强化学习的智能博弈对抗问题求解的难点问题进行了讨论,指出下一步深度强化学习算法走向实用的发展方向。 Aiming at the problems of conflict between exploration and utilization,sparse reward signals,low data utilization rate,and difficulty in stable convergence in solving the practical course of action planning for Intelligent Game Confrontation based on deep reinforcement learning.The knowledge-based type-learning intelligent generation mode is analyzed,and the knowledge driven method is proposed.The course of planning model of intelligent game confrontation from the aspects of rule-based teaching,data-based learning and problem-based guidance and other aspects is constructed,which provides theoretical support for improving the exploration utilization efficiency,accurate reward function and accelerating algorithm convergence.The difficult problems of solving the intelligent game confrontation problem based on reinforcement learning are discussed,and the more practical development direction of the next step deep enforcement learning algorithm is pointed out.

作者陈希亮曹雷康凯李晨溪 CHEN Xiliang;CAO Lei;KANG Kai;LI Chenxi(College of Command and Control Engineering,Army Engineering University,Nanjing 210007,China;Unit 31108 of PLA,Nanjing 210007,China)

机构地区陆军工程大学指挥控制工程学院解放军

出处《指挥与控制学报》 CSCD 北大核心 2024年第4期509-515,共7页 Journal of Command and Control

基金国家自然科学基金(62273356)资助。

关键词深度强化学习博弈对抗知识驱动行动序列规划 deep reinforcement learning intelligent game confrontation knowledge driven course of action planning

分类号 G63 [文化科学—教育学]

引文网络
相关文献

1潘向黎.妈妈,你要一直美下去[J].七彩语文,2024(26):9-10.
2王兴.集合问题求解中的“核心素养”[J].中学生数理化（高一数学）,2024(9):23-23.
3张文昌,陈晓波,王庚,骆成军,向昱瑾.基于TRIZ理论的高压开关柜转运小车的创新设计[J].家电维修,2024(8):92-94.
4汪转兰.初中数学解题技巧探微——以圆相关问题为例[J].数理天地（初中版）,2024(18):35-36.
5乔和,李增辉,刘春,胡嗣栋.基于改进好奇心的深度强化学习方法[J].计算机应用研究,2024,41(9):2635-2640.
6陈加文,孙凝,刘建国.一种提升单探测器复合跟踪中粗跟瞄子系统预测精度和稳定性的方法[J].半导体光电,2024,45(4):658-661.
7崔潇月.荒诞剧的冷情感生成模式探微[J].戏剧艺术,2024(4):69-77.
8时高松,赵清海,董鑫,贺家豪,刘佳源.基于PPO算法的自动驾驶人机交互式强化学习方法[J].计算机应用研究,2024,41(9):2732-2736.
9周红鹃.浅析农村路网规划助力乡村振兴建设——以浙江省衢州市柯城区为例[J].住宅产业,2024(8):75-77.
10陈黎明,高越.从草明到双雪涛:当代文学东北工业叙事的嬗变[J].关东学刊,2024(3):114-124.

指挥与控制学报

2024年第4期

浏览历史

内容加载中请稍等...

知识驱动的智能博弈对抗行动序列规划方法

相关作者

相关机构

相关主题

浏览历史