基于强化学习的深空探测器自主任务规划方法

An Autonomous Planning Method for Deep Space Exploration Tasks in Reinforcement Learning Based on Dynamic Rewards

下载PDF

导出

摘要针对深空探测器自主任务规划多约束的需求,提出了基于动态奖励的强化学习深空探测器任务自主规划模型构建方法,建立了深空探测器智能体的交互环境,构建了策略网络和融合资源、时间以及时序约束的损失函数,并提出动态奖励机制对传统策略梯度学习方法进行了改进。仿真实验结果表明:该方法可实现自主任务规划,规划成功率和规划效率相比静态奖励策略梯度算法有明显的提升,并且能在任意状态下开始规划而无需改变模型结构,提高了对不确定规划任务的适应性。该方法为深空探测器自主任务规划与决策提供了一种新的解决方案。 Aiming at the characteristics of multi-system parallelism and the need to meet various constraints in the proceAiming at the characteristics of multi-system parallelism and the need to meet various constraints in the process of autonomous mission planning of deep space detectors,a reinforcement learning task autonomous planning model construction method for deep space detectors was proposed based on dynamic rewards,and a deep space detector agent was established.In the interactive environment,a policy network and a loss function integrating resource constraints,time constraints and timing constraints were constructed,and a dynamic reward mechanism was proposed to improve the traditional policy gradient learning method.The simulation results show that the method in this paper could realize autonomous task planning.Compared with the static reward policy gradient algorithm,the planning success rate and planning efficiency were significantly improved,and the method could start planning in any state without changing the model structure,which improved the accuracy of the algorithm.This method provides a new solution for autonomous mission planning and decision-making of deep space probes.

作者毛维杨王彬柳景兴熊新 MAO Weiyang;WANG Bin;LIU Jingxing;XIONG Xin(Faculty of Information Engineering&Automation,Kunming University of Science&Technology,Kunming 650500,China;Yunnan key Laboratory of Artificial Intelligence,Kunming University of Science&Technology,Kunming 650500,China)

机构地区昆明理工大学信息工程与自动化学院昆明理工大学云南省人工智能重点实验室

出处《深空探测学报（中英文）》 CSCD 北大核心 2023年第2期220-230,共11页 Journal Of Deep Space Exploration

基金民用航天预研资助项目。

关键词深空探测任务规划策略梯度强化学习动态奖励 deep space exploration task planning policy gradient reinforcement learning dynamic reward

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献19

1王晓晖,李爽.深空探测器约束简化与任务规划方法研究[J].宇航学报,2016,37(7):768-774. 被引量：20
2冯小恩,李玉庆,杨晨,何熊文,徐勇,朱立颖.面向自主运行的深空探测航天器体系结构设计及自主任务规划方法[J].控制理论与应用,2019,36(12):2035-2041. 被引量：7
3王鑫,赵清杰,徐瑞.基于知识图谱的深空探测器任务规划建模[J].深空探测学报（中英文）,2021,8(3):315-323. 被引量：3
4李高杨,吕晓鹏,张星.基于强化学习的交通信号控制及深度学习应用[J].人工智能,2020(3):84-92. 被引量：5
5周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017,40(6):1229-1251. 被引量：1591
6史兼郡,张进,罗亚中,郭帅,李智远,李大鹏.基于深度强化学习算法的空间站任务重规划方法[J].载人航天,2020,26(4):469-476. 被引量：6
7贺东雷,冯小恩,雷明佳,江飞龙,董诗音,李玉庆.面向深空探测任务的实数遗传编码多星任务规划算法[J].控制理论与应用,2019,36(12):2055-2064. 被引量：2
8李玉庆,徐敏强,王日新.航天器观测重调度问题中的模糊性不确定因素及其处理[J].宇航学报,2009,30(3):1106-1111. 被引量：1
9赵宇庭,徐瑞,李朝玉,朱圣英.基于动态智能体交互图的深空探测器任务规划方法[J].深空探测学报（中英文）,2021,8(5):519-527. 被引量：1
10金颢,徐瑞,朱圣英,李朝玉,梁子璇.适用于深空探测器的时间线转移路标启发式规划方法[J].宇航学报,2021,42(7):862-872. 被引量：2

二级参考文献128

1邱育红.GIS空间分析中两种改进的路径规划算法[J].计算机系统应用,2007,16(7):33-35. 被引量：6
2Potter W,Gasch J.A photo album of Earth scheduling daily landsat 7 activitice[C].Proceedings of SpaceOps 1998,Tokyo,Japan,15 June 1998.
3Lemaitre M,Verfailli G,Jouhaud F,et al.Selecting and scheduling observations of agile satellites[J].Aerospace Science and Technology,2002,7:367-381.
4Lin W C,Liao D Y,Liu C Y,Lee Y Y.Daily imaging scheduling of an Earth obecrvation satellite[J].IEEE Transactions on System,Man,and Cybernetics,2005,35(2):213-223.
5Globus A,Lohn J,Pryor A.Scheduling Earth observing satellites with evolutionary algorithms[C].Proceedings of International Conference on Space Mission Challenges for Information Technology,2003.
6Frank J,Jonssen A,Morris R,Smith D.Planning and schedullng for fleets of Earth obeerving satellites[C].Proceedings of the 6th International Symposium on Artificial Intelligence,Robotics,Automation and Space 2002,June 2002,Montreal,18-22.
7Glohus A,Lohn J,Morris R.Scheduling Earth observing fleets using evolutionary algorithms:problem description end approach[C].Proceedings of the 3td International NASA Workshop on Planning and Scheduling for Space,October,2002,NASA,Houston,Texas,27-29.
8Rao J D,Soma P,Padmashree G S.Multi-satellite scheduling system for LEO satellite operations[C].Proceedings of SpaceOps 1998,Tokyo,Japan,1-5 June 1998.
9Vieira G E,Horrmann J W,LINE.Rescheduling manufacturing systems:a framework of strategies,policies,and methods[J].Journal of Scheduling,2003,6(1):39-62.
10Church L K,Uzsoy R.Analysis of periodic and event driven rescheduling policies in dynamic shops[J].International Journal of Computer Integrated Manufacturing,1992,5(3):153-163.

共引文献1677

1陆文超,崔海朋.一种基于融合自编码与神经网络的协同过滤算法[J].中国水运（下半月）,2022,22(3):18-20.
2杜佳峰,王景松,杨宝军,薛勇新,郑春华.基于卷积神经网络的船舶水尺字符识别方法研究[J].中国水运（下半月）,2020(3):1-3. 被引量：1
3陆天和,刘莉,贺云涛,杨盾.多无人机航迹规划算法及关键技术[J].战术导弹技术,2020(1):85-90. 被引量：7
4徐雪松,曾智,邵红燕,杨胜杰,李想.基于个体-协同触发强化学习的多机器人行为决策方法[J].仪器仪表学报,2020(5):66-75. 被引量：9
5林桢哲,王桂棠,陈建强,符秦沈.基于残差网络深度学习的肺部CT图像结节良恶性分类模型[J].仪器仪表学报,2020,41(3):248-256. 被引量：20
6陈仁祥,张勇,杨黎霞,陈才,徐向阳.基于整周期数据和卷积神经网络的谐波减速器健康状态评估[J].仪器仪表学报,2020,41(2):245-252. 被引量：19
7鲍光海,林善银,徐林森.基于改进型卷积网络的汽车高度调节器缺陷检测方法[J].仪器仪表学报,2020,41(2):157-165. 被引量：11
8谭宇辰,蔡晶晶,倪辰.基于深度学习的Web攻击检测技术研究[J].信息网络安全,2020(S02):122-126.
9任杰,李钢,赵燕姣,姚琼辛,田培辰.基于改进Faster RCNN的城市道路货车检测[J].计算机系统应用,2022,31(12):316-321.
10胡伟,文武,魏敏.改进U-Net的高分辨率遥感图像轻量化分割[J].计算机系统应用,2022,31(12):135-146. 被引量：2

1赵花娥.任务驱动教学法在职高计算机专业课的运用[J].中文科技期刊数据库（引文版）教育科学,2021(11):0068-0069.
2王晓立.关于任务驱动法在高中计算机教学中的运用[J].中文科技期刊数据库（文摘版）教育,2021(7):0149-0149.
3胡栋鹏.移动WebAR服务在系统交互环境下的技术应用研究[J].信息记录材料,2023,24(5):160-162.
4王敏.基于地理信息大数据在国土空间规划中的应用研究[J].中国科技期刊数据库工业A,2023(7):0101-0104.
5王琪.跨校修读学分《公关与沟通》线上线下混合式课程建设与实践研究[J].齐齐哈尔师范高等专科学校学报,2023(2):101-103. 被引量：1
6郑雪芹.自主品牌能否借高端新能源MPV破局[J].汽车纵横,2023(7):8-11.
7郑雪芹.合资和外资品牌汽车出口瞄准新能源[J].汽车纵横,2023(7):24-26.
8张西林,何亚坤,张恪易,王君秋.面向自主空中加油任务的目标检测技术研究[J].航空科学技术,2023,34(2):64-71. 被引量：2
9林楠.中百集团花样翻新[J].支点,2023(5):54-56.
10余吉雅,张艳超,张文博.基于YOLOv4-Tiny和RRT-Connect算法的机械臂自主抓取仿真[J].建模与仿真,2023,12(3):2773-2781.

深空探测学报（中英文）

2023年第2期

浏览历史

内容加载中请稍等...

基于强化学习的深空探测器自主任务规划方法

参考文献19

二级参考文献128

共引文献1677

相关作者

相关机构

相关主题

浏览历史