摘要
将强化学习应用到机器人的运动规划领域时,智能体无法感知周围环境且不能有效避开障碍物,从而无法推广到复杂、具有挑战性的地形。针对这些问题,提出使用基于多模态深度强化学习来解决无人车的运动规划任务,该方法学习如何结合本体感知状态和高维深度传感器输入。具体来说,本体感知状态提供用于即时反应的接触测量,并且无人车可以通过配备的视觉传感器学习并预测环境变化,提前多个时间步骤主动机动地应对障碍和不平坦地形的环境。提出了一种全新的端到端多模态Transformer融合模型,称为TransProAct(transformer-based proactive action),通过该模型的自我注意力机制融合本体感知状态和视觉信息,利用深度强化学习PPO算法训练无人车自我学习运动规划,引入多模态延迟随机化解决模拟和现实世界之间的差异。分别在不同障碍和不平坦地形的具有挑战性的仿真环境中进行评估,结果表明基于多模态深度强化学习的方法不仅显著改进了基线,在泛化性上也有很大的提高。
Since the agent cannot sense the surrounding environment and cannot successfully avoid obstacles,reinforcement learning fails to be generalized to robot motion planning in difficult terrain.Therefore,a solution based on multimodal deep reinforcement learning,which learns to blend proprioceptive states with high-dimensional depth sensor inputs,is proposed for the motion planning of unmanned vehicles.To be specific,proprioceptive states offer contact measurement for immediate reaction,and the unmanned vehicle can learn and forecast environmental changes with its attached visual sensors,proactively navigating around obstacles and uneven terrains numerous time steps ahead.TransProAct(transformer-based proactive action),a unique end-to-end multimodal Transformer fusion model,is proposed.Proprioceptive states and visual data are fused through its self-attention mechanism,and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by 11月TransProAct(transformer-based proactive action),a unique end-to-end multimodal Transformer fusion model,is proposed.Proprioceptive states and visual data are fused through its self-attention mechanism,and then the deep reinforcement algorithm PPO is used to train the self-learning of motion planning by the unmanned vehicle.In addition,multimodal delay randomization is introduced to resolve the differences between simulation and reality.After being tested in difficult simulation environments with a variety of barriers and uneven ground,the proposed approach shows notable gains over the baseline and a remarkable improvement in generalization ability.
作者
丁开源
艾斯卡尔·艾木都拉
朱斌
伊克萨尼·普尔凯提
马正堂
Ding Kaiyuan;Askar Hamdulla;Zhu Bin;Eksan Firkat;Ma Zhengtang(School of Computer Science and Technology,Xinjiang University,Urumqi 830017,China;Xinjiang Key Laboratory of SignalDetection and Processing,Urumqi 830017,China;Department of Automation,Tsinghua University,Beijing 100084,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2024年第11期2631-2643,共13页
Journal of System Simulation
关键词
多模态感知
强化学习
无人车
运动规划
神经网络
multimodal perception
reinforcement learning
unmanned vehicle
motion planning
neural network