基于强化学习的舰船目标跟踪有限理性博弈算法研究

Research on Bounded Rational Game Algorithm for Ship Target Tracking Based on Reinforcement Learning

下载PDF

导出

摘要针对现实中的决策者并非总能完全理性分析问题的情况,提出有限理性下的追逃博弈算法。建立追逃博弈模型,先求解完全理性下博弈双方的鞍点策略。引入有限理性level-k模型,对追击者和躲避者思考策略的层次进行结构性假设,允许追逃双方具备不同的策略推理能力,并给出相应等级的值函数和策略,策略满足HJI方程。随着等级的增加,策略最终会趋于纳什均衡。由于HJI方程难以直接求解,基于强化学习的actor-critic算法进行求解,设计算法使追击者能够估算出躲避者的思维等级并采取合适的策略。以舰船为对象,将舰船运动简化为二维的数学模型,建立舰船追逃博弈模型,对其进行算法仿真验证。 Since decision-makers in reality are not always able to analyze problems perfectly rationally,a pursuit evasion game algorithm based on bounded rationality is proposed.It establishes a pursuit evasion game model and first solves the saddle point strategies of the two players under perfect rationality.Introducing the bounded rationality level-k model,a structural assumption is made on the level of thinking strategies for pursuers and evaders.It allows both parties to have different strategic reasoning abilities,and gives corresponding levels’value functions and strategies,which satisfy the HJI equation.As the level increases,the strategy will eventually tend towards Nash equilibrium.Due to the difficulty in directly solving the HJI equation,an actor critic algorithm based on reinforcement learning is used to solve it.The algorithm is designed to enable pursuers to estimate the thinking level of evaders and adopt appropriate strategies.Simplify the motion of a ship as a two-dimensional mathematical model,this paper establishes a ship pursuit and evasion game model,and performs algorithm simulation verification on it.

作者陈素霞徐清雯刘久富解晖刘向武 CHEN Suxia;XU Qingwen;LIU Jiufu;XIE Hui;LIU Xiangwu(Department of Computer and Art Design,Henan Light Industry Vocational College,Zhengzhou 450008,China;College of Automation,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区河南轻工职业学院计算机与艺术设计系南京航空航天大学自动化学院

出处《计算机工程与应用》 CSCD 北大核心 2024年第20期116-123,共8页 Computer Engineering and Applications

基金国家自然科学基金(61473144)。

关键词追逃博弈目标跟踪强化学习有限理性 pursuit-evasion game target tracking reinforcement learning bounded rationality

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1胡晓丽.小学数学生态课堂对话式教学的实践与思考[J].学生·家长·社会,2022(38):0025-0027.
2胡歧曦.SOLO理论在二元一次方程组解法教学中的应用[J].上海中学数学,2021(5):35-39.
3张丽玲.SOLO分类视域下地理课堂学习评价的教学实践--以人教版2019新课程“地域文化与城乡景观”为例[J].福建教育学院学报,2022,23(6):49-52.
4李敏婷.SOLO分类理论在小学英语阅读课上的应用研究[J].科教导刊（电子版）,2021(19):241-242.
5郭延宁,李高健,于永彬.基于改进蜣螂优化的GEO轨道多脉冲追逃博弈[J].中国空间科学技术（中英文）,2024,44(4):1-10.
6苏浩,季明江,郭鹏宇,曹璐.基于行为树的多星轨道追逃博弈方法[J].智能安全,2024,3(3):82-91.
7仝秉达,段海滨,魏晨.仿鹰鸽捕食逃逸行为的多无人机分组对抗博弈方法[J].控制理论与应用,2024,41(5):855-865.
8张书婷.家庭教育视角下中职生情绪心理与行为[J].科研成果与传播,2024(2):0091-0094.
9康长青,吴中博,朱丽娟,王敏,胡春阳.地方高校计算机类学生计算思维的评价探索[J].现代信息科技,2023,7(7):183-185. 被引量：1
10孙文军,朱昌锋,李辉.考虑不同救援能力的应急救援人员派遣演化博弈[J].深圳大学学报（理工版）,2024,41(4):406-414.

计算机工程与应用

2024年第20期

浏览历史

内容加载中请稍等...

基于强化学习的舰船目标跟踪有限理性博弈算法研究

相关作者

相关机构

相关主题

浏览历史