Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
VRML是虚拟现实建模语言(Virtual Reality Model Language)的简称,是一种基于文本的描述三维环境的场景描述语言,是HTML的3D(三维)模拟。本文以天体运动中的"日食、月食"为例,讨论了利用组件化技术构建天体运动的三维场景以及...VRML是虚拟现实建模语言(Virtual Reality Model Language)的简称,是一种基于文本的描述三维环境的场景描述语言,是HTML的3D(三维)模拟。本文以天体运动中的"日食、月食"为例,讨论了利用组件化技术构建天体运动的三维场景以及VRML虚拟场景与外界交互的手段和方法。针对传统的三维制作软件(如3D Max)在演示过程中不受用户控制和无法实现实时的交互等缺点,重点研究了利用VRML节点库中的内插器节点与传感器节点结合和Script节点集成高级语言(如Javascript)来实现交互式天体运动场景的方法及实现过程,通过该平台可以实现人机交互。由于复杂的交互式三维运动场景在计算机上运行速度不够理想,最后提出了采用编组和内联等方案对运动场景进行了优化。展开更多
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
文摘VRML是虚拟现实建模语言(Virtual Reality Model Language)的简称,是一种基于文本的描述三维环境的场景描述语言,是HTML的3D(三维)模拟。本文以天体运动中的"日食、月食"为例,讨论了利用组件化技术构建天体运动的三维场景以及VRML虚拟场景与外界交互的手段和方法。针对传统的三维制作软件(如3D Max)在演示过程中不受用户控制和无法实现实时的交互等缺点,重点研究了利用VRML节点库中的内插器节点与传感器节点结合和Script节点集成高级语言(如Javascript)来实现交互式天体运动场景的方法及实现过程,通过该平台可以实现人机交互。由于复杂的交互式三维运动场景在计算机上运行速度不够理想,最后提出了采用编组和内联等方案对运动场景进行了优化。