期刊文献+

基于强化学习的深空探测器自主任务规划方法

An Autonomous Planning Method for Deep Space Exploration Tasks in Reinforcement Learning Based on Dynamic Rewards
下载PDF
导出
摘要 针对深空探测器自主任务规划多约束的需求,提出了基于动态奖励的强化学习深空探测器任务自主规划模型构建方法,建立了深空探测器智能体的交互环境,构建了策略网络和融合资源、时间以及时序约束的损失函数,并提出动态奖励机制对传统策略梯度学习方法进行了改进。仿真实验结果表明:该方法可实现自主任务规划,规划成功率和规划效率相比静态奖励策略梯度算法有明显的提升,并且能在任意状态下开始规划而无需改变模型结构,提高了对不确定规划任务的适应性。该方法为深空探测器自主任务规划与决策提供了一种新的解决方案。 Aiming at the characteristics of multi-system parallelism and the need to meet various constraints in the proceAiming at the characteristics of multi-system parallelism and the need to meet various constraints in the process of autonomous mission planning of deep space detectors,a reinforcement learning task autonomous planning model construction method for deep space detectors was proposed based on dynamic rewards,and a deep space detector agent was established.In the interactive environment,a policy network and a loss function integrating resource constraints,time constraints and timing constraints were constructed,and a dynamic reward mechanism was proposed to improve the traditional policy gradient learning method.The simulation results show that the method in this paper could realize autonomous task planning.Compared with the static reward policy gradient algorithm,the planning success rate and planning efficiency were significantly improved,and the method could start planning in any state without changing the model structure,which improved the accuracy of the algorithm.This method provides a new solution for autonomous mission planning and decision-making of deep space probes.
作者 毛维杨 王彬 柳景兴 熊新 MAO Weiyang;WANG Bin;LIU Jingxing;XIONG Xin(Faculty of Information Engineering&Automation,Kunming University of Science&Technology,Kunming 650500,China;Yunnan key Laboratory of Artificial Intelligence,Kunming University of Science&Technology,Kunming 650500,China)
出处 《深空探测学报(中英文)》 CSCD 北大核心 2023年第2期220-230,共11页 Journal Of Deep Space Exploration
基金 民用航天预研资助项目。
关键词 深空探测 任务规划 策略梯度 强化学习 动态奖励 deep space exploration task planning policy gradient reinforcement learning dynamic reward
  • 相关文献

参考文献19

二级参考文献128

  • 1邱育红.GIS空间分析中两种改进的路径规划算法[J].计算机系统应用,2007,16(7):33-35. 被引量:6
  • 2Potter W,Gasch J.A photo album of Earth scheduling daily landsat 7 activitice[C].Proceedings of SpaceOps 1998,Tokyo,Japan,15 June 1998.
  • 3Lemaitre M,Verfailli G,Jouhaud F,et al.Selecting and scheduling observations of agile satellites[J].Aerospace Science and Technology,2002,7:367-381.
  • 4Lin W C,Liao D Y,Liu C Y,Lee Y Y.Daily imaging scheduling of an Earth obecrvation satellite[J].IEEE Transactions on System,Man,and Cybernetics,2005,35(2):213-223.
  • 5Globus A,Lohn J,Pryor A.Scheduling Earth observing satellites with evolutionary algorithms[C].Proceedings of International Conference on Space Mission Challenges for Information Technology,2003.
  • 6Frank J,Jonssen A,Morris R,Smith D.Planning and schedullng for fleets of Earth obeerving satellites[C].Proceedings of the 6th International Symposium on Artificial Intelligence,Robotics,Automation and Space 2002,June 2002,Montreal,18-22.
  • 7Glohus A,Lohn J,Morris R.Scheduling Earth observing fleets using evolutionary algorithms:problem description end approach[C].Proceedings of the 3td International NASA Workshop on Planning and Scheduling for Space,October,2002,NASA,Houston,Texas,27-29.
  • 8Rao J D,Soma P,Padmashree G S.Multi-satellite scheduling system for LEO satellite operations[C].Proceedings of SpaceOps 1998,Tokyo,Japan,1-5 June 1998.
  • 9Vieira G E,Horrmann J W,LINE.Rescheduling manufacturing systems:a framework of strategies,policies,and methods[J].Journal of Scheduling,2003,6(1):39-62.
  • 10Church L K,Uzsoy R.Analysis of periodic and event driven rescheduling policies in dynamic shops[J].International Journal of Computer Integrated Manufacturing,1992,5(3):153-163.

共引文献1677

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部