期刊文献+

基于改进TD3算法的无人机决策研究

Research on UAV Decision Making Based on Improved TD3Algorithm
下载PDF
导出
摘要 针对无人机在执行打击任务时,对飞行区域的环境掌握甚少、飞行区域大、目标稀疏、火力威胁等相对不确定因素会导致任务的完成度低的问题。本文提出利用双策略网络对双延迟深度确定性策略梯度算法(TD3)进行改进,解决单策略动作波动大问题。针对优质训练样本利用率低导致的收敛速度慢问题,采用优先经验回放机制提高优质样本利用率。使用改进后的TD3算法训练无人机,通过改变偏航角、俯仰角以及速度控制无人机飞行,无人机在三维环境中完成抵近打击任务并规避威胁。实验结果表明,改进后的算法相比于传统TD3算法,算法收敛的速度更快,无人机的任务完成度可相对提高15%。 The UAV has little understanding of the environment of the flight area when executing the strike mission,and the relatively uncertain factors such as large flight area,sparse target and fire threat will lead to the low completion of the task.The Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is improved by using a dual-strategy network to solve the problem of large fluctuation of single-strategy action.In order to solve the problem of slow convergence caused by low utilization rate of high-quality training samples,the preferential experience playback mechanism is adopted to improve the utilization rate of high-quality training samples.By changing the yaw Angle,pitch Angle and speed to control the UAV flight,the UAV can complete the close-in strike task and avoid the threat in the three-dimensional environment.The experimental results show that compared with the traditional TD3 algorithm,the improved algorithm converges faster,and the task completion degree of UAV can be improved by 15%.
作者 蒋方庆 陈自力 高喜俊 王春峰 贺道坤 Jiang Fangqing;Chen Zili;Gao Xijun;Wang Chunfeng;He Daokun(Shijiazhuang Campus,Army Engineering University,Shijiazhuang 050003,China;School of Intelligent Manufacturing,Nanjing Vocational College of Information Technology,Nanjing 210023,China)
出处 《信息化研究》 2023年第3期36-42,共7页 INFORMATIZATION RESEARCH
基金 “十四五”装备预先研究项目(No.50911060101)
关键词 无人机 双延迟深度确定性策略梯度算法 双策略网络 优先经验回放 UVA TD3 double policy network prioritized experience replay
  • 相关文献

参考文献9

二级参考文献39

共引文献481

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部