示教知识辅助的无人机强化学习控制算法被引量：1

UAV reinforcement learning control algorithm with demonstrations

下载PDF

导出

摘要针对强化学习(RL)应用于无人机自主控制中学习效率低的问题,结合示教学习利用专家经验对其进行改进,提出基于示教知识辅助的无人机RL控制算法。通过设立示教目标函数、修正值函数,将专家经验作为监督信号引入到策略更新中,实现专家经验对基于RL的无人机自主控制系统优化过程的引导,同时,设置专家经验样本缓存库,利用经验优先回放机制赋予经验样本不同的利用率,提高数据的使用效率。仿真结果表明:与普通的无人机RL控制器相比,所提算法能够在训练初期快速获得奖励值,整个学习过程中获得的奖励值更高,学习到的控制策略的响应速度更快、准确性更高。示教知识的加入有效引导了算法的学习,提高了无人机自主控制系统的学习效率,同时,能够提高算法的性能,有利于学习到更好的控制策略。此外,示教知识的加入扩大了经验数据的种类,有利于促进算法的稳定性,使无人机自主控制系统对奖励函数的设置具有鲁棒性。 The practical application of reinforcement learning(RL)in an unmanned aerial vehicle control is restricted by low learning efficiency.An algorithm integrating RL with imitation learning was proposed to improve the performance of autonomous flight control systems.By establishing new loss and value functions,demonstrations were included as supervisory signals to actor and critic networks updating.Two replay buffers were utilized to store demonstration data and the data generated by interacting with the environment respectively.The prioritized experience replay system enhances the use of high-quality data and may assess the ratio of experience data utilization while learning.Simulation results showed that the RL control algorithm with demonstrations quickly obtained high rewards in the early stage of training and it had higher rewards during the whole training process than the conventional RL algorithm.The control strategy obtained by the proposed algorithm had faster response speed and higher control precision.Demonstrations enhance both the performance of the algorithm and the learning efficiency of the unmanned aerial vehicle autonomous control system,which makes it easier to learn more effective control techniques.The addition of demonstrations expands experience data,and increases the stability of the algorithm,making the unmanned aerial vehicle autonomous control system robust to the setting of the reward function.

作者孙丹高东郑建华韩鹏 SUN Dan;GAO Dong;ZHENG Jianhua;HAN Peng(National Space Science Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区中国科学院国家空间科学中心中国科学院大学

出处《北京航空航天大学学报》 EI CAS CSCD 北大核心 2023年第6期1424-1433,共10页 Journal of Beijing University of Aeronautics and Astronautics

基金北京市科技计划(Z191100004319004)。

关键词强化学习专家示教无人机自主控制学习系统 reinforcement learning demonstrations unmanned aerial vehicle autonomous control learning systems

分类号 V249.12 [航空宇航科学与技术—飞行器设计]

引文网络
相关文献

同被引文献5

1陈宁,于继来.基于电气剖分信息的风电系统有功调度与控制[J].中国电机工程学报,2008,28(16):51-58. 被引量：52
2张伯明,吴文传,郑太一,孙宏斌.消纳大规模风电的多时间尺度协调的有功调度系统设计[J].电力系统自动化,2011,35(1):1-6. 被引量：235
3张智刚,康重庆.碳中和目标下构建新型电力系统的挑战与展望[J].中国电机工程学报,2022,42(8):2806-2818. 被引量：502
4王珂,姚建国,余佩遥,杨胜春,钟海旺,严嘉豪.基于深度强化学习的电网前瞻调度智能决策架构及关键技术初探[J].中国电机工程学报,2022,42(15):5430-5438. 被引量：17
5蒲天骄,张中浩,谈元鹏,莫文昊,郭剑波.电力人工智能技术理论基础与发展展望(二):自主学习与应用初探[J].中国电机工程学报,2023,43(10):3705-3717. 被引量：7

引证文献1

1仪忠凯,梁寿愚,王伟,蒋蔚,杨程,辛焱.电力系统调度决策:一种示教学习辅助加速的安全强化学习方法[J].中国电机工程学报,2024,44(13):5084-5096.

1陈小泉,张磊,梁斌.浅析组合参数在智能预警系统中的应用[J].通讯世界,2023,30(2):193-195.
2赵立阳,常天庆,褚凯轩,郭理彬,张雷.完全合作类多智能体深度强化学习综述[J].计算机工程与应用,2023,59(12):14-27. 被引量：5
3张秋菊,吕青.机器人多模态智能操作技术研究综述[J].计算机科学与探索,2023,17(4):792-809. 被引量：3
4卢绍庆.基于高速DSP技术的低成本智能焊接机器人设计与研究[J].计算机测量与控制,2023,31(5):120-125.
5关宇.政府补助对审计费用的影响分析——基于我国A股上市公司的经验数据[J].黄冈师范学院学报,2023,43(3):117-122. 被引量：1
6戴阳.复杂环境下钢箱梁吊装施工BIM技术应用[J].人民交通,2023(11):122-125.
7韩保军,高强,代飞,杨宵,吕颖,许忠义,付希越.基于协同奖励函数多目标强化学习的智能频率控制策略研究[J].电力科学与技术学报,2023,38(2):18-29. 被引量：3
8陈凤霞,郁静.关键审计事项披露的影响因素研究——基于A+H股上市公司的经验数据[J].西南大学学报（自然科学版）,2023,45(7):160-171.
9朱圣迪.上海轨道交通03A02/04A02型列车TACS改造方案研究[J].城市轨道交通研究,2023,26(S01):85-88. 被引量：1
10中华医学会皮肤性病学分会,中国医师协会皮肤科医师分会,肖汀,赵作涛,高兴华.慢性自发性荨麻疹达标治疗专家共识(2023)[J].中华皮肤科杂志,2023,56(6):489-495. 被引量：6

北京航空航天大学学报

2023年第6期

浏览历史

内容加载中请稍等...

示教知识辅助的无人机强化学习控制算法被引量：1

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

示教知识辅助的无人机强化学习控制算法 被引量：1

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

示教知识辅助的无人机强化学习控制算法被引量：1