基于深度强化学习的无人机自主探索方法

Autonomous Exploration Methods for Unmanned Aerial Vehicles Based on Deep Reinforcement Learning

下载PDF

导出

摘要无人机面对非结构化未知环境,如山地和丛林等场景进行探索时,必须在缺乏先验条件的情况下同时进行环境感知和航迹规划。传统方法受制于算法和传感器等多重因素的制约,探索范围有限,效率低下,并易受到环境变化的干扰。为解决这一问题,提出了一种基于深度强化学习的无人机自主探索方法。该方法以归一化优势函数(Normalized Advantage Functions,NAF)算法为基础,引入了3种算法增强机制,以提升无人机在非结构化未知环境中的探索范围和效率。在自行设计的仿真环境中进行实验,结果表明,改进后的NAF算法相较于原始版本,具有更大的探索范围和更高的效率,同时表现出优越的收敛性和鲁棒性。 Faced with unstructured and unknown environments,such as exploring in mountains and jungles,UAVs must simultaneously perform environment sensing and trajectory planning in the absence of a priori conditions.Traditional methods are constrained by multiple factors such as algorithms and sensors,resulting in limited exploration range,low efficiency,and susceptibility to interference from environmental changes.To solve this problem,this study proposes an autonomous exploration method for UAVs based on deep reinforcement learning.The method is based on the normalized advantage functions(NAF)algorithm and introduces three algorithmic enhancement mechanisms to improve the exploration range and efficiency of UAVs in unstructured and unknown environments.By conducting experiments in a self-designed simulation environment,the results of simulation experiments and analysis show that the improved NAF algorithm has a larger exploration range and higher efficiency compared to the original version,while exhibiting superior convergence and robustness.

作者唐嘉宁李成阳周思达马孟星施炀 TANG Jianing;LI Chengyang;ZHOU Sida;MA Mengxing;SHI Yang(School of Electrical and Information Technology,Yunnan Minzu University,Kunming 650031,China;Yunnan Key Laboratory of Unmanned Autonomous System,Kunming 650031,China;Institute of Unmanned Autonomous Systems,Yunnan Minzu University,Kunming 650031,China)

机构地区云南民族大学电气信息工程学院云南省无人自主系统重点实验室云南民族大学无人自主系统研究院

出处《计算机科学》 CSCD 北大核心 2024年第S02期144-149,共6页 Computer Science

基金国家自然科学基金(61963038,62063035)。

关键词无人机自主探索智能决策深度强化学习 NAF算法增强机制 Autonomous UAV exploration Intelligent decision making Deep reinforcement learning NAF algorithm Augmentation mechanism

分类号 V249 [航空宇航科学与技术—飞行器设计]

引文网络
相关文献

参考文献1

1Xin MA,Ya XU,Guo-qiang SUN,Li-xia DENG,Yi-bin LI.State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots[J].Journal of Zhejiang University-Science C(Computers and Electronics),2013,14(3):167-178. 被引量：5

二级参考文献40

1Agirrebeitia, 3., Aviles, R., de Bustos, I.F., Ajuria, C., 2005. A new APF strategy for path planning in environments with obstacles. Mech. Maeh. Theory., 40(6):645-658. Idol: 10.1016/j.meehmaeht heory.2005.01.0061.
2Alexopoulos, C., Griffin, P.M., 1992. Path plmming for a. mobile robot. IEEE Trans. S'yst. Man CybeT"r,, 22(2): 318-322. [doi:10.1109/21.148404].
3AI-Taharwa, I., Sheta, A., Al-Weshah, M., 2008. A mobile robot path planning using genetic algorithm in staticenvironment. J. Coztput. Sci., 4(4):341-344.
4Barraquand, J., Langlois, B., Latombe, J.C., 1992. Nu- merical potential field techniques for robot path plan- ning. IEEE Trans. Syst. Man Cybern., 22(2):224-241. [doi: 10.1109/21.148426].
5Cao, Q., Huang, Y., Zhou, J., 2006. An Evolutionary Artificial Potential Field Algorithm for Dynamic Path Planning of Mobile Robot. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.3331-3336. [doi: 10.1109/IROS.2006.2825081.
6Castiilo, 0., Trujillo, L., Melin, P., 2007. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots. Soft Conput., 11(3):269- 279. [doi: 10.1007/s00500-006-0068-4].
7I)earden, R., Friedman, N., Russell, S., 1998. Bayesian Q-Learning. Proc. National Conf. on Artificial Intelli- gence, p.761-768.
8Dolgov, D., Thrun, S., Montemerlo, M., Diebet, J., 2010. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res., 29(5):485-501. [doi: 10.1177/0278364909359210].
9Framling, K., 2007. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 20(6):736-747. Idol: 10.1016/j.neunet.2007.02.0011.
10Garcia, M.A., Montiel, O., Castillo, O., Sepulveda, R., Melin, P., 2009. Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation. Appl. Soft Comput., 9(3):1102- 1110. [doi: 10.1016/j .asoc.2009.02.014].

共引文献4

1闫丰亭,贾金原.DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法[J].系统仿真学报,2019,31(1):16-26. 被引量：4
2孙凤山,张威,葛琳琳.移动机器人在未知环境中避障的控制策略[J].辽宁石油化工大学学报,2016,36(4):69-72. 被引量：3
3周思达,李罗宇,唐嘉宁,杨昕.无人机RLT的三维山地仿真环境搭建与制作[J].计算机仿真,2023,40(5):87-93. 被引量：1
4王小康,冀杰,刘洋,贺庆.基于改进Q学习算法的无人物流配送车路径规划[J].系统仿真学报,2024,36(5):1211-1221. 被引量：2

1赵季红,董莎,胡晓燕,崔文静.多域网络中基于域间时延博弈的端到端动态协同切片方法[J].计算机应用研究,2024,41(6):1820-1824.
2王江涛,樊荣,黄哲.SM9中高次幂运算的快速实现方法[J].计算机工程,2023,49(9):118-124.
3周健.破,然后立[J].年轻人（D版）（幼师园地）,2024(9):1-1.
4魏晓明,薛雯.培养小学生语文阅读兴趣的实践探索[J].小学阅读指南（导学版）,2024(10):88-90.
5聂子杰,沈潇垚,胡晨,陈光明.非酒精性脂肪肝基因改造小鼠模型的研究进展[J].肝脏,2024,29(9):1146-1150.
6欧阳志强,罗荣,张静.基于ECC公钥加密体制的监控网络安全接入协议[J].电子设计工程,2024,32(18):66-70.
7苗丙,葛世荣.采煤机数字孪生导航截割运动规划理论与方法[J].工矿自动化,2024,50(8):1-13.
8吴芸.公共图书馆阅读推广联盟实践探究——以长三角一体化阅读联盟“少儿阅读推广活动”为例[J].兰台内外,2024(35):81-83.
9华军,黄磊,杨亚东,邢小茹,朱正洪.CPCs泡沫微结构仿生构筑及其压缩力学性能研究[J].固体力学学报,2024,45(5):665-678.
10李荣晟,杨小龙,严晞隽,任天助.基于改进蚁群算法的飞行器航迹规划研究[J].导弹与航天运载技术（中英文）,2024(5):41-47.

计算机科学

2024年第S02期

浏览历史

内容加载中请稍等...

基于深度强化学习的无人机自主探索方法

参考文献1

二级参考文献40

共引文献4

相关作者

相关机构

相关主题

浏览历史