Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dens...Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dense depth maps of complex scenes.However,simultaneous se⁃mantic and spatial joint perception,so-called dense 3D semantic mapping,estimating the 3D ge⁃ometry of a scene and attaching semantic labels to the geometry,remains a challenging problem that,if solved,would make structured vision understanding and editing more widely accessible.Concurrently,progress in computer vision and machine learning has motivated us to pursue the capability of understanding and digitally reconstructing the surrounding world.Neural metric-se⁃mantic understanding is a new and rapidly emerging field that combines differentiable machine learning techniques with physical knowledge from computer vision,e.g.,the integration of visualinertial simultaneous localization and mapping(SLAM),mesh reconstruction,and semantic un⁃derstanding.In this paper,we attempt to summarize the recent trends and applications of neural metric-semantic understanding.Starting with an overview of the underlying computer vision and machine learning concepts,we discuss critical aspects of such perception approaches.Specifical⁃ly,our emphasis is on fully leveraging the joint semantic and 3D information.Later on,many im⁃portant applications of the perception capability such as novel view synthesis and semantic aug⁃mented reality(AR)contents manipulation are also presented.Finally,we conclude with a dis⁃cussion of the technical implications of the technology under a 5G edge computing scenario.展开更多
The traditional strategy of 3D model reconstruction mainly concentrates on orthographic projections or engineering drawings. But there are some shortcomings. Such as, only few kinds of solids can be reconstructed, the...The traditional strategy of 3D model reconstruction mainly concentrates on orthographic projections or engineering drawings. But there are some shortcomings. Such as, only few kinds of solids can be reconstructed, the high complexity of time and less information about the 3D model. The research is extended and process card is treated as part of the 3D reconstruction. A set of process data is a superset of 2D engineering drawings set. The set comprises process drawings and process steps, and shows a sequencing and asymptotic course that a part is made from roughcast blank to final product. According to these characteristics, the object to be reconstructed is translated from the complicated engineering drawings into a series of much simpler process drawings. With the plentiful process information added for reconstruction, the disturbances such as irrelevant graph, symbol and label, etc. can be avoided. And more, the form change of both neighbor process drawings is so little that the engineering drawings interpretation has no difficulty; in addition, the abnormal solution and multi-solution can be avoided during reconstruction, and the problems of being applicable to more objects is solved ultimately. Therefore, the utility method for 3D reconstruction model will be possible. On the other hand, the feature information in process cards is provided for reconstruction model. Focusing on process cards, the feasibility and requirements of Working Procedure Model reconstruction is analyzed, and the method to apply and implement the Natural Language Understanding into the 3D reconstruction is studied. The method of asymptotic approximation product was proposed, by which a 3D process model can be constructed automatically and intelligently. The process model not only includes the information about parts characters, but also can deliver the information of design, process and engineering to the downstream applications.展开更多
Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes...Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.展开更多
In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complem...In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complementary data can improve the discriminative ability of the model.Fusion is a very challenging task since 2D and 3D data are essentially different and show different formats.The existing methods first extract 2D multi-view image features and then aggregate them into sparse 3D point clouds and achieve superior performance.However,the existing methods ignore the structural relations between pixels and point clouds and directly fuse the two modals of data without adaptation.To address this,we propose a structural deep metric learning method on pixels and points to explore the relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.Extensive experiments on the widely used ScanNetV2 and S3DIS datasets verify the performance of the proposed SAFNet.展开更多
This research aimed to combine 3 cell and tissue culture technologies to obtain mechanistic insights of cells in porous scaffolds. When cultivated on 2D (2-dimensional) surfaces, HDFs (human dermal fibroblasts) be...This research aimed to combine 3 cell and tissue culture technologies to obtain mechanistic insights of cells in porous scaffolds. When cultivated on 2D (2-dimensional) surfaces, HDFs (human dermal fibroblasts) behaved individually and had no strict requirement on seeding density for proliferation; while HaCat cells relied heavily on initial densities for proliferation and colony formation, which was facilitated when co-cultured with HDFs. Experiments using a 3D CCIS (3-dimensional cell culture and imaging system) indicated that HDFs colonised openpores of varying sizes (125-420 ~tm) on modular substrates via bridge structures; while HaCat cells formed aperture structures and only colonised small pores (125 txm). When co-cultured, HDFs not only facilitated HaCat attachment on the substrates, but also coordinated with HaCat cells to colonise open pores of varying sizes via bridge and aperture structures. Based on these observations, a 2-stage strategy for the culture of HDFs and HaCat cells on porous scaffolds was proposed and applied successfully on a cellulosic scaffold. This research demonstrated that cell colonisation in scaffolds was dependent on multiple factors; while the integrated 2D&3D culture technologies and the 3D CCIS was an effective and efficient approach to obtain mechanistic insights of their influences on tissue regeneration.展开更多
Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit rela...Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit relation reasoning to extract relation contexts.However,there exist inevitably redundant relation contexts due to noisy or low-quality proposals.In fact,invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity,which may,on the contrary,reduce the performance in complex scenes.Inspired by recent attention mechanism like Transformer,we propose a novel 3D attention-based relation module(ARM3D).It encompasses objectaware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts.In this way,ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts,which mitigates the ambiguity in detection.We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results.Extensive experiments show the capability and generalization of ARM3D on 3D object detection.Our source code is available at https://github.com/lanlan96/ARM3D.展开更多
The objective of this research is the rapid reconstruction of ancient buildings of historical importance using a single image. The key idea of our approach is to reduce the infinite solutions that might otherwise aris...The objective of this research is the rapid reconstruction of ancient buildings of historical importance using a single image. The key idea of our approach is to reduce the infinite solutions that might otherwise arise when recovering a 3D geometry from 2D photographs. The main outcome of our research shows that the proposed methodology can be used to reconstruct ancient monuments for use as proxies for digital effects in applications such as tourism, games, and entertainment, which do not require very accurate modeling. In this article, we consider the reconstruction of ancient Mughal architecture including the Taj Mahal. We propose a modeling pipeline that makes an easy reconstruction possible using a single photograph taken from a single view, without the need to create complex point clouds from multiple images or the use of laser scanners. First, an initial model is automatically reconstructed using locally fitted planar primitives along with their boundary polygons and the adjacency relation among parts of the polygons. This approach is faster and more accurate than creating a model from scratch because the initial reconstruction phase provides a set of structural information together with the adjacency relation, which makes it possible to estimate the approximate depth of the entire structural monument. Next, we use manual extrapolation and editing techniques with modeling software to assemble and adjust different 3D components of the model. Thus, this research opens up the opportunity for the present generation to experience remote sites of architectural and cultural importance through virtual worlds and real-time mobile applications. Variations of a recreated 3D monument to represent an amalgam of various cultures are targeted for future work.展开更多
in this poper a novel data-and rule-driven system for 3D scene description and segmentation inan unknown environment is presented.This system generatss hierachies of features that correspond tostructural elements such...in this poper a novel data-and rule-driven system for 3D scene description and segmentation inan unknown environment is presented.This system generatss hierachies of features that correspond tostructural elements such as boundaries and shape classes of individual object as well as relationshipsbetween objects.It is implemented as an added high-level component to an existing low-level binocularvision system[1]. Based on a pair of matched stereo images produced by that system,3D segmentation is firstperformed to group object boundary data into several edge-sets,each of which is believed to belong to aparticular object.Then gross features of each object are extracted and stored in an object recbrd.The finalstructural description of the scene is accomplished with information in the object record,a set of rules and arule implementor. The System is designed to handle partially occluded objects of different shapes and sizeson the 2D imager.Experimental results have shown its success in computing both object and structurallevel descriptions of common man-made objects.展开更多
We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provi...We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provided by a variety of common capture setups.Depth is inferred by an efficient and lightweight single-branch network,which employs a dynamic gating system to process together dense visual data and sparse geometric data.We exploit the characteristics of typical man-made environments to efficiently compress multiresolution features and find short-and long-range relations among scene parts.Furthermore,we introduce a new augmentation strategy to make the model robust to different types of sparsity,including those generated by various structured light sensors and LiDAR setups.The experimental results demonstrate that our method provides interactive performance and outperforms stateof-the-art solutions in computational efficiency,adaptivity to variable depth sparsity patterns,and prediction accuracy for challenging indoor data,even when trained solely on synthetic data without any fine tuning.展开更多
文摘Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dense depth maps of complex scenes.However,simultaneous se⁃mantic and spatial joint perception,so-called dense 3D semantic mapping,estimating the 3D ge⁃ometry of a scene and attaching semantic labels to the geometry,remains a challenging problem that,if solved,would make structured vision understanding and editing more widely accessible.Concurrently,progress in computer vision and machine learning has motivated us to pursue the capability of understanding and digitally reconstructing the surrounding world.Neural metric-se⁃mantic understanding is a new and rapidly emerging field that combines differentiable machine learning techniques with physical knowledge from computer vision,e.g.,the integration of visualinertial simultaneous localization and mapping(SLAM),mesh reconstruction,and semantic un⁃derstanding.In this paper,we attempt to summarize the recent trends and applications of neural metric-semantic understanding.Starting with an overview of the underlying computer vision and machine learning concepts,we discuss critical aspects of such perception approaches.Specifical⁃ly,our emphasis is on fully leveraging the joint semantic and 3D information.Later on,many im⁃portant applications of the perception capability such as novel view synthesis and semantic aug⁃mented reality(AR)contents manipulation are also presented.Finally,we conclude with a dis⁃cussion of the technical implications of the technology under a 5G edge computing scenario.
文摘The traditional strategy of 3D model reconstruction mainly concentrates on orthographic projections or engineering drawings. But there are some shortcomings. Such as, only few kinds of solids can be reconstructed, the high complexity of time and less information about the 3D model. The research is extended and process card is treated as part of the 3D reconstruction. A set of process data is a superset of 2D engineering drawings set. The set comprises process drawings and process steps, and shows a sequencing and asymptotic course that a part is made from roughcast blank to final product. According to these characteristics, the object to be reconstructed is translated from the complicated engineering drawings into a series of much simpler process drawings. With the plentiful process information added for reconstruction, the disturbances such as irrelevant graph, symbol and label, etc. can be avoided. And more, the form change of both neighbor process drawings is so little that the engineering drawings interpretation has no difficulty; in addition, the abnormal solution and multi-solution can be avoided during reconstruction, and the problems of being applicable to more objects is solved ultimately. Therefore, the utility method for 3D reconstruction model will be possible. On the other hand, the feature information in process cards is provided for reconstruction model. Focusing on process cards, the feasibility and requirements of Working Procedure Model reconstruction is analyzed, and the method to apply and implement the Natural Language Understanding into the 3D reconstruction is studied. The method of asymptotic approximation product was proposed, by which a 3D process model can be constructed automatically and intelligently. The process model not only includes the information about parts characters, but also can deliver the information of design, process and engineering to the downstream applications.
基金Supported by National Natural Science Foundation of China(61872024)National Key R&D Program of China under Grant(2018YFB2100603).
文摘Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.
基金supported by the National Natural Science Foundation of China(No.61976023)。
文摘In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complementary data can improve the discriminative ability of the model.Fusion is a very challenging task since 2D and 3D data are essentially different and show different formats.The existing methods first extract 2D multi-view image features and then aggregate them into sparse 3D point clouds and achieve superior performance.However,the existing methods ignore the structural relations between pixels and point clouds and directly fuse the two modals of data without adaptation.To address this,we propose a structural deep metric learning method on pixels and points to explore the relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.Extensive experiments on the widely used ScanNetV2 and S3DIS datasets verify the performance of the proposed SAFNet.
文摘This research aimed to combine 3 cell and tissue culture technologies to obtain mechanistic insights of cells in porous scaffolds. When cultivated on 2D (2-dimensional) surfaces, HDFs (human dermal fibroblasts) behaved individually and had no strict requirement on seeding density for proliferation; while HaCat cells relied heavily on initial densities for proliferation and colony formation, which was facilitated when co-cultured with HDFs. Experiments using a 3D CCIS (3-dimensional cell culture and imaging system) indicated that HDFs colonised openpores of varying sizes (125-420 ~tm) on modular substrates via bridge structures; while HaCat cells formed aperture structures and only colonised small pores (125 txm). When co-cultured, HDFs not only facilitated HaCat attachment on the substrates, but also coordinated with HaCat cells to colonise open pores of varying sizes via bridge and aperture structures. Based on these observations, a 2-stage strategy for the culture of HDFs and HaCat cells on porous scaffolds was proposed and applied successfully on a cellulosic scaffold. This research demonstrated that cell colonisation in scaffolds was dependent on multiple factors; while the integrated 2D&3D culture technologies and the 3D CCIS was an effective and efficient approach to obtain mechanistic insights of their influences on tissue regeneration.
基金National Nature Science Foundation of China(62132021,62102435,62002375,62002376)National Key R&D Program of China(2018AAA0102200)NUDT Research Grants(ZK19-30)。
文摘Relation contexts have been proved to be useful for many challenging vision tasks.In the field of3D object detection,previous methods have been taking the advantage of context encoding,graph embedding,or explicit relation reasoning to extract relation contexts.However,there exist inevitably redundant relation contexts due to noisy or low-quality proposals.In fact,invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity,which may,on the contrary,reduce the performance in complex scenes.Inspired by recent attention mechanism like Transformer,we propose a novel 3D attention-based relation module(ARM3D).It encompasses objectaware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts.In this way,ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts,which mitigates the ambiguity in detection.We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results.Extensive experiments show the capability and generalization of ARM3D on 3D object detection.Our source code is available at https://github.com/lanlan96/ARM3D.
基金Project partially supported by the Ministry of Culture,Sports and Tourism and Korea Creative Content Agency in the Culture Technology Research&Development Program 2014(50%)the Next Generation Information Computing Development Program through the National Research Foundation of Korea funded by the Ministry of Science,ICT and Future Planning(No.2012M3C4A7032185)(50%)
文摘The objective of this research is the rapid reconstruction of ancient buildings of historical importance using a single image. The key idea of our approach is to reduce the infinite solutions that might otherwise arise when recovering a 3D geometry from 2D photographs. The main outcome of our research shows that the proposed methodology can be used to reconstruct ancient monuments for use as proxies for digital effects in applications such as tourism, games, and entertainment, which do not require very accurate modeling. In this article, we consider the reconstruction of ancient Mughal architecture including the Taj Mahal. We propose a modeling pipeline that makes an easy reconstruction possible using a single photograph taken from a single view, without the need to create complex point clouds from multiple images or the use of laser scanners. First, an initial model is automatically reconstructed using locally fitted planar primitives along with their boundary polygons and the adjacency relation among parts of the polygons. This approach is faster and more accurate than creating a model from scratch because the initial reconstruction phase provides a set of structural information together with the adjacency relation, which makes it possible to estimate the approximate depth of the entire structural monument. Next, we use manual extrapolation and editing techniques with modeling software to assemble and adjust different 3D components of the model. Thus, this research opens up the opportunity for the present generation to experience remote sites of architectural and cultural importance through virtual worlds and real-time mobile applications. Variations of a recreated 3D monument to represent an amalgam of various cultures are targeted for future work.
文摘in this poper a novel data-and rule-driven system for 3D scene description and segmentation inan unknown environment is presented.This system generatss hierachies of features that correspond tostructural elements such as boundaries and shape classes of individual object as well as relationshipsbetween objects.It is implemented as an added high-level component to an existing low-level binocularvision system[1]. Based on a pair of matched stereo images produced by that system,3D segmentation is firstperformed to group object boundary data into several edge-sets,each of which is believed to belong to aparticular object.Then gross features of each object are extracted and stored in an object recbrd.The finalstructural description of the scene is accomplished with information in the object record,a set of rules and arule implementor. The System is designed to handle partially occluded objects of different shapes and sizeson the 2D imager.Experimental results have shown its success in computing both object and structurallevel descriptions of common man-made objects.
基金funding from the Autonomous Region of Sardinia under project XDATA.Eva Almansa,Armando Sanchez,Giorgio Vassena,and Enrico Gobbetti received funding from the European Union's H2020 research and innovation programme under grant 813170(EVOCATION).
文摘We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provided by a variety of common capture setups.Depth is inferred by an efficient and lightweight single-branch network,which employs a dynamic gating system to process together dense visual data and sparse geometric data.We exploit the characteristics of typical man-made environments to efficiently compress multiresolution features and find short-and long-range relations among scene parts.Furthermore,we introduce a new augmentation strategy to make the model robust to different types of sparsity,including those generated by various structured light sensors and LiDAR setups.The experimental results demonstrate that our method provides interactive performance and outperforms stateof-the-art solutions in computational efficiency,adaptivity to variable depth sparsity patterns,and prediction accuracy for challenging indoor data,even when trained solely on synthetic data without any fine tuning.