Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes...Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.展开更多
Scene graph is a infrastructure of the virtual reality system to organize the virtual scene with abstraction, it can provide facility for the rendering engine and should be integrated effectively on demand into a real...Scene graph is a infrastructure of the virtual reality system to organize the virtual scene with abstraction, it can provide facility for the rendering engine and should be integrated effectively on demand into a real-time system, where a large quantities of scene objects and resources can be manipulated and managed with high flexibility and reliability. We present a new scheme of multiple scene graphs to accommodate the features of rendering engine and distributed systems. Based upon that, some other functions, e.g. block query, interactive editing, permission management, instance response, "redo" and "undo", are implemented to satisfy various requirements. At the same time, our design has compatibility to popular C/S architecture with good concurrent performance. Above all, it is convenient to be used for further development. The results of experiments including responding time demonstrate its good performance.展开更多
In this paper, a novel component-based scene graph is proposed, in which all objects in the scene are classified to different entities, and a scene can be represented as a hierarchical graph composed of the instances ...In this paper, a novel component-based scene graph is proposed, in which all objects in the scene are classified to different entities, and a scene can be represented as a hierarchical graph composed of the instances of entities. Each entity contains basic data and its operations which are encapsulated into the entity component. The entity possesses certain behaviours which are responses to rules and interaction defined by the high-level application. Such behaviours can be described by script or behaviours model. The component-based scene graph in the paper is more abstractive and high-level than traditional scene graphs. The contents of a scene could be extended flexibly by adding new entities and new entity components, and behaviour modification can be obtained by modifying the model components or behaviour scripts. Its robustness and efficiency are verified by many examples implemented in the Virtual Scenario developed by Peking University.展开更多
Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existin...Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data.展开更多
An ultra-massive distributed virtual environment generally consists of ultra-massive terrain data and a large quantity of objects and their attribute data, such as 2D/3D geometric models, audio/video, images, vectors,...An ultra-massive distributed virtual environment generally consists of ultra-massive terrain data and a large quantity of objects and their attribute data, such as 2D/3D geometric models, audio/video, images, vectors, characteristics, etc. In this paper, we propose a novel method for constructing distributed scene graphs with high extensibility. This method can support high concurrent interaction of clients and implement various tasks such as editing, querying, accessing and motion controlling. Some application experiments are performed to demonstrate its efficiency and soundness.展开更多
Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providi...Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.展开更多
针对图像描述方法中对图像文本信息的遗忘及利用不充分问题,提出了基于场景图感知的跨模态交互网络(SGC-Net)。首先,使用场景图作为图像的视觉特征并使用图卷积网络(GCN)进行特征融合,从而使图像的视觉特征和文本特征位于同一特征空间;...针对图像描述方法中对图像文本信息的遗忘及利用不充分问题,提出了基于场景图感知的跨模态交互网络(SGC-Net)。首先,使用场景图作为图像的视觉特征并使用图卷积网络(GCN)进行特征融合,从而使图像的视觉特征和文本特征位于同一特征空间;其次,保存模型生成的文本序列,并添加对应的位置信息作为图像的文本特征,以解决单层长短期记忆(LSTM)网络导致的文本特征丢失的问题;最后,使用自注意力机制提取出重要的图像信息和文本信息后并对它们进行融合,以解决对图像信息过分依赖以及对文本信息利用不足的问题。在Flickr30K和MSCOCO(MicroSoft Common Objects in COntext)数据集上进行实验的结果表明,与Sub-GC相比,SGC-Net在BLEU1(BiLingual Evaluation Understudy with 1-gram)、BLEU4(BiLingual Evaluation Understudy with 4-grams)、METEOR(Metric for Evaluation of Translation with Explicit ORdering)、ROUGE(Recall-Oriented Understudy for Gisting Evaluation)和SPICE(Semantic Propositional Image Caption Evaluation)指标上分别提升了1.1、0.9、0.3、0.7、0.4和0.3、0.1、0.3、0.5、0.6。可见,SGC-Net所使用的方法能够有效提升模型的图像描述性能及生成描述的流畅度。展开更多
基金Supported by National Natural Science Foundation of China(61872024)National Key R&D Program of China under Grant(2018YFB2100603).
文摘Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.
基金Supported by National Natural Science Foundation of China(Nos.61173080,61232014,61472010,61421062)National Key Technology Support Program of China(No.2013BAK03B07)
文摘Scene graph is a infrastructure of the virtual reality system to organize the virtual scene with abstraction, it can provide facility for the rendering engine and should be integrated effectively on demand into a real-time system, where a large quantities of scene objects and resources can be manipulated and managed with high flexibility and reliability. We present a new scheme of multiple scene graphs to accommodate the features of rendering engine and distributed systems. Based upon that, some other functions, e.g. block query, interactive editing, permission management, instance response, "redo" and "undo", are implemented to satisfy various requirements. At the same time, our design has compatibility to popular C/S architecture with good concurrent performance. Above all, it is convenient to be used for further development. The results of experiments including responding time demonstrate its good performance.
基金Project supported by the National Basic Research Program (973) of China (No. 2004CB719403), and the National Natural Science Foun-dation of China (Nos. 60573151 and 60473100)
文摘In this paper, a novel component-based scene graph is proposed, in which all objects in the scene are classified to different entities, and a scene can be represented as a hierarchical graph composed of the instances of entities. Each entity contains basic data and its operations which are encapsulated into the entity component. The entity possesses certain behaviours which are responses to rules and interaction defined by the high-level application. Such behaviours can be described by script or behaviours model. The component-based scene graph in the paper is more abstractive and high-level than traditional scene graphs. The contents of a scene could be extended flexibly by adding new entities and new entity components, and behaviour modification can be obtained by modifying the model components or behaviour scripts. Its robustness and efficiency are verified by many examples implemented in the Virtual Scenario developed by Peking University.
基金This work was supported by the National Natural Science Foundation of China(Nos.62173045 and 61673192)the Fundamental Research Funds for the Central Universities(No.2020XD-A04-2)the BUPT Excellent PhD Students Foundation(No.CX2021222).
文摘Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data.
基金Supported by the National Basic Research Program of China (Grant No. 2004CB719403)the National High-Tech Research & Development Program of China (Grant Nos. 2006AA01Z334, 2007AA01Z318, 2009AA01Z324)+1 种基金the National Natural Science Foundation of China (GrantNos. 60573151,60703062,60833007)the Marine 908-03-01-10 Project
文摘An ultra-massive distributed virtual environment generally consists of ultra-massive terrain data and a large quantity of objects and their attribute data, such as 2D/3D geometric models, audio/video, images, vectors, characteristics, etc. In this paper, we propose a novel method for constructing distributed scene graphs with high extensibility. This method can support high concurrent interaction of clients and implement various tasks such as editing, querying, accessing and motion controlling. Some application experiments are performed to demonstrate its efficiency and soundness.
基金funded by(i)Natural Science Foundation China(NSFC)under Grant Nos.61402397,61263043,61562093 and 61663046(ii)Open Foundation of Key Laboratory in Software Engineering of Yunnan Province:No.2020SE304.(iii)Practical Innovation Project of Yunnan University,Project Nos.2021z34,2021y128 and 2021y129.
文摘Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.
文摘针对图像描述方法中对图像文本信息的遗忘及利用不充分问题,提出了基于场景图感知的跨模态交互网络(SGC-Net)。首先,使用场景图作为图像的视觉特征并使用图卷积网络(GCN)进行特征融合,从而使图像的视觉特征和文本特征位于同一特征空间;其次,保存模型生成的文本序列,并添加对应的位置信息作为图像的文本特征,以解决单层长短期记忆(LSTM)网络导致的文本特征丢失的问题;最后,使用自注意力机制提取出重要的图像信息和文本信息后并对它们进行融合,以解决对图像信息过分依赖以及对文本信息利用不足的问题。在Flickr30K和MSCOCO(MicroSoft Common Objects in COntext)数据集上进行实验的结果表明,与Sub-GC相比,SGC-Net在BLEU1(BiLingual Evaluation Understudy with 1-gram)、BLEU4(BiLingual Evaluation Understudy with 4-grams)、METEOR(Metric for Evaluation of Translation with Explicit ORdering)、ROUGE(Recall-Oriented Understudy for Gisting Evaluation)和SPICE(Semantic Propositional Image Caption Evaluation)指标上分别提升了1.1、0.9、0.3、0.7、0.4和0.3、0.1、0.3、0.5、0.6。可见,SGC-Net所使用的方法能够有效提升模型的图像描述性能及生成描述的流畅度。