We introduce a novel framework for 3 D scene reconstruction with simultaneous object annotation,using a pre-trained 2 D convolutional neural network(CNN),incremental data streaming,and remote exploration,with a virtua...We introduce a novel framework for 3 D scene reconstruction with simultaneous object annotation,using a pre-trained 2 D convolutional neural network(CNN),incremental data streaming,and remote exploration,with a virtual reality setup.It enables versatile integration of any 2 D box detection or segmentation network.We integrate new approaches to(i)asynchronously perform dense 3 D-reconstruction and object annotation at interactive frame rates,(ii)efficiently optimize CNN results in terms of object prediction and spatial accuracy,and(iii)generate computationally-efficient colliders in large triangulated3 D-reconstructions at run-time for 3 D scene interaction.Our method is novel in combining CNNs with long and varying inference time with live 3 D-reconstruction from RGB-D camera input.We further propose a lightweight data structure to store the 3 D-reconstruction data and object annotations to enable fast incremental data transmission for real-time exploration with a remote client,which has not been presented before.Our framework achieves update rates of 22 fps(SSD Mobile Net)and 19 fps(Mask RCNN)for indoor environments up to 800 m^(3).We evaluated the accuracy of 3 D-object detection.Our work provides a versatile foundation for semantic scene understanding of large streamed3 D-reconstructions,while being independent from the CNN’s processing time.Source code is available for non-commercial use.展开更多
文摘We introduce a novel framework for 3 D scene reconstruction with simultaneous object annotation,using a pre-trained 2 D convolutional neural network(CNN),incremental data streaming,and remote exploration,with a virtual reality setup.It enables versatile integration of any 2 D box detection or segmentation network.We integrate new approaches to(i)asynchronously perform dense 3 D-reconstruction and object annotation at interactive frame rates,(ii)efficiently optimize CNN results in terms of object prediction and spatial accuracy,and(iii)generate computationally-efficient colliders in large triangulated3 D-reconstructions at run-time for 3 D scene interaction.Our method is novel in combining CNNs with long and varying inference time with live 3 D-reconstruction from RGB-D camera input.We further propose a lightweight data structure to store the 3 D-reconstruction data and object annotations to enable fast incremental data transmission for real-time exploration with a remote client,which has not been presented before.Our framework achieves update rates of 22 fps(SSD Mobile Net)and 19 fps(Mask RCNN)for indoor environments up to 800 m^(3).We evaluated the accuracy of 3 D-object detection.Our work provides a versatile foundation for semantic scene understanding of large streamed3 D-reconstructions,while being independent from the CNN’s processing time.Source code is available for non-commercial use.