期刊文献+

基于强化学习的海洋移动观测网络观测路径规划方法 被引量:3

Path planning for mobile ocean observation network based on reinforcement learning
下载PDF
导出
摘要 合理有效地对移动海洋环境观测平台进行规划,有利于海洋环境观测网络的设计和海洋环境信息的采集。针对庞大的海洋环境,在有限的观测资源下,使用深度强化学习算法对海洋环境观测网络进行规划。针对强化学习算法求解路径规划问题中的离散和连续动作设计问题,分别使用DQN和DDPG两种算法对该问题进行单平台和多平台实验,实验结果表明,使用离散动作的DQN算法的奖赏函数优于使用连续动作的DDPG算法。进一步对两种算法求解的移动海洋观测平台的采样路径结果进行分析,结果显示,使用离散动作的DQN算法的采样结果也更好。实验结果证明,使用离散动作的DQN算法可以最大化对海洋环境中有效资料信息采集,说明了该方法的有效性和可行性。 Reasonable and effective planning method of mobile vehicles for marine environmental observation is beneficial to the design of marine environmental observation network and the collection efficiency of marine environmental information.In view of the vast marine environment and limited observation resources,the deep reinforcement learning algorithm is used to plan the marine environmental observation network.In order to solve the problems in the design of discrete and continuous motion during the path planning,two algorithms,DQN and DDPG,are designed to solve the problem of single platform and multi-platform experiments.The experimental results show that the reward curve of DQN algorithm using discrete motion is better than DDPG algorithm using continuous motion.This paper further analyzes the sampling path results of the mobile vehicles for marine environmental observation,and the results show that the sampling result of DQN algorithm with discrete action is better.The experimental results show that the DQN algorithm using discrete motion can maximize the effective data information collection,which demonstrates effectiveness and feasibility of the method.
作者 赵玉新 杜登辉 成小会 周迪 邓雄 刘延龙 ZHAO Yuxin;DU Denghui;CHENG Xiaohui;ZHOU Di;DENG Xiong;LIU Yanlong(College of Intelligent Systems Science and Engineering,Harbin Engineering University,Harbin 150001,China;China Ship Development and Design Center,Wuhan 430064,China)
出处 《智能系统学报》 CSCD 北大核心 2022年第1期192-200,共9页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(41676088) 中央高校基本科研业务费项目(3072021CFJ0401).
关键词 深度强化学习 海洋环境观测 路径规划 无人测量船 Q学习 多智能体 深度确定性策略梯度 高斯排序 deep reinforcement learning marine environmental observation path planning USV Q learning multiagent DDPG RankGauss
  • 相关文献

参考文献7

二级参考文献57

  • 1王景武,金立生.车辆自适应巡航控制系统控制技术的发展[J].汽车技术,2004(7):1-4. 被引量:19
  • 2习近平.发挥海洋资源优势 建设海洋经济强省——在全省海洋经济工作会议上的讲话[J].浙江经济,2003(16):6-11. 被引量:15
  • 3Dean Reommich the Argo Steering Team. Argo, the challenge of continuing 10 years of progress[ J]. Oceanography,2009, 22 ( 3 ) : 46-55.
  • 4Gregg W W. Assimilation of SeaWiFS ocean chlorophyll data into a three-dimensional global ocean model [ J ]. Journal of Marine Systems,2008,69(3/4) : 205-225.
  • 5Zibordi G, Melin F, Berthon J F. Comparison of SeaWiFS, MODIS and MERIS radiometric products at a coastal site[ J]. Geophysical Research Letters ,2006,33 (6) :231-246.
  • 6Westberry T, Behrenfeld M J, Siegel D A, et al. Carbon-based primary productivity modeling with vertically resolved photoacclimation[ J]. Global Biogeochemical Cycles,2008,22(2) :456-512.
  • 7Moline M A, Schofield O. Remote real-time video-enabled docking for underwater autonomous platforms [ J ]. Journal of Atmospheric and Oceanic Technology,2009, 26 (12) : 2 665-2 672.
  • 8Peterson T C, Baringer M O, Thorne P W, et al. State of the climate in 2008 [ J]. Bulletin of the American Meteorological Society, 2009, 90 ( 8 ) : 349-367.
  • 9Jannasch H W, Coletti L J, Johnson K S, et al. The Land/Ocean Biogeochemical Observatory: A robust networked mooring system for continuously monitoring complex biogeochemical cycles in estuaries[ J ]. Limnology and Oceanography-Methods,2008,6 ( 1 ) : 263-276.
  • 10Niemann H, Fischer D, Graffe D, et al. Biogeochemistry of a lowactivity cold seep in the Larsen B area, western Weddell Sea, Antarctica[ J]. Biogeosciences, 2009,6 ( 11 ) : 2 383-2 395.

共引文献120

同被引文献37

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部