This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real...This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real-time performance.To address these issues,we first adopt the Elastic Fusion algorithm to select key frames from indoor environment image sequences captured by the Kinect sensor and construct the indoor environment space model.Then,an indoor RGB-D image semantic segmentation network is proposed,which uses multi-scale feature fusion to quickly and accurately obtain object labeling information at the pixel level of the spatial point cloud model.Finally,Bayesian updating is used to conduct incremental semantic label fusion on the established spatial point cloud model.We also employ dense conditional random fields(CRF)to optimize the 3D semantic map model,resulting in a high-precision spatial semantic map of indoor scenes.Experimental results show that the proposed semantic mapping system can process image sequences collected by RGB-D sensors in real-time and output accurate semantic segmentation results of indoor scene images and the current local spatial semantic map.Finally,it constructs a globally consistent high-precision indoor scenes 3D semantic map.展开更多
Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of co...Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure.展开更多
The quick response code based artificial labels are applied to provide semantic concepts and relations of surroundings that permit the understanding of complexity and limitations of semantic recognition and scene only...The quick response code based artificial labels are applied to provide semantic concepts and relations of surroundings that permit the understanding of complexity and limitations of semantic recognition and scene only with robot's vision.By imitating spatial cognizing mechanism of human,the robot constantly received the information of artificial labels at cognitive-guide points in a wide range of structured environment to achieve the perception of the environment and robot navigation.The immune network algorithm was used to form the environmental awareness mechanism with "distributed representation".The color recognition and SIFT feature matching algorithm were fused to achieve the memory and cognition of scenario tag.Then the cognition-guide-action based cognizing semantic map was built.Along with the continuously abundant map,the robot did no longer need to rely on the artificial label,and it could plan path and navigate freely.Experimental results show that the artificial label designed in this work can improve the cognitive ability of the robot,navigate the robot in the case of semi-unknown environment,and build the cognizing semantic map favorably.展开更多
Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dens...Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dense depth maps of complex scenes.However,simultaneous se⁃mantic and spatial joint perception,so-called dense 3D semantic mapping,estimating the 3D ge⁃ometry of a scene and attaching semantic labels to the geometry,remains a challenging problem that,if solved,would make structured vision understanding and editing more widely accessible.Concurrently,progress in computer vision and machine learning has motivated us to pursue the capability of understanding and digitally reconstructing the surrounding world.Neural metric-se⁃mantic understanding is a new and rapidly emerging field that combines differentiable machine learning techniques with physical knowledge from computer vision,e.g.,the integration of visualinertial simultaneous localization and mapping(SLAM),mesh reconstruction,and semantic un⁃derstanding.In this paper,we attempt to summarize the recent trends and applications of neural metric-semantic understanding.Starting with an overview of the underlying computer vision and machine learning concepts,we discuss critical aspects of such perception approaches.Specifical⁃ly,our emphasis is on fully leveraging the joint semantic and 3D information.Later on,many im⁃portant applications of the perception capability such as novel view synthesis and semantic aug⁃mented reality(AR)contents manipulation are also presented.Finally,we conclude with a dis⁃cussion of the technical implications of the technology under a 5G edge computing scenario.展开更多
Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed ...Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development.展开更多
As for language learners, vocabulary acquisition is one of the most important tasks, and the same goes for L2 teachers.Recently, some researchers claim that semantic mapping is far more effective in vocabulary. The pr...As for language learners, vocabulary acquisition is one of the most important tasks, and the same goes for L2 teachers.Recently, some researchers claim that semantic mapping is far more effective in vocabulary. The present study comparatively investigate two strategies of teaching vocabulary to non-English lower-intermediate learners. One is the traditional grammatical description and drilling way, the other is the semantic mapping. Six new words were taught by using these two ways respectively,and a sample test was applied to check students' mastery of these words at the end of the class. The test results show that there is no significant difference between the learning outcomes of these two different teaching approaches, but learners turned out to be more engaged when the teacher conducted the semantic links teaching method. Therefore, perhaps, a combination of these two approaches would be a good choice for those L2 learners.展开更多
High-definition map has become a vital cornerstone in the navigation of autonomous vehicles in complex traffic scenarios.Thus,the construction of high-definition maps has become crucial.Traditional methods relying on ...High-definition map has become a vital cornerstone in the navigation of autonomous vehicles in complex traffic scenarios.Thus,the construction of high-definition maps has become crucial.Traditional methods relying on expensive mapping vehicles equipped with high-end sensor equipment are not suitable for mass map construction because of the limitation imposed by its high cost.Hence,this paper proposes a new method to create a high-definition road semantics map using multi-vehicle sensor data.The proposed method implements crowdsourced point-based visual SLAM to align and combine the local maps derived by multiple vehicles.This allows users to modify the extraction process by using a more sophisticated neural network,thus achieving a more accurate detection result when compared with traditional binarization method.The resulting map consists of road marking points suitable for autonomous vehicle navigation and path-planning tasks.Finally,the method is evaluated on the real-world KAIST urban dataset and Shougang dataset to demonstrate the level of detail and accuracy of the proposed map with 0.369 m in mapping errors in ideal condition.展开更多
In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense s...In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and tile motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify tim feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate tile dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.展开更多
基金This work was supported in part by the National Natural Science Foundation of China under Grant U20A20225,61833013in part by Shaanxi Provincial Key Research and Development Program under Grant 2022-GY111.
文摘This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real-time performance.To address these issues,we first adopt the Elastic Fusion algorithm to select key frames from indoor environment image sequences captured by the Kinect sensor and construct the indoor environment space model.Then,an indoor RGB-D image semantic segmentation network is proposed,which uses multi-scale feature fusion to quickly and accurately obtain object labeling information at the pixel level of the spatial point cloud model.Finally,Bayesian updating is used to conduct incremental semantic label fusion on the established spatial point cloud model.We also employ dense conditional random fields(CRF)to optimize the 3D semantic map model,resulting in a high-precision spatial semantic map of indoor scenes.Experimental results show that the proposed semantic mapping system can process image sequences collected by RGB-D sensors in real-time and output accurate semantic segmentation results of indoor scene images and the current local spatial semantic map.Finally,it constructs a globally consistent high-precision indoor scenes 3D semantic map.
基金Supported by the National Natural Science Foundation of China(No.61305042,61202098)Projects of Center for Remote Sensing Mission Study of China National Space Administration(No.2012A03A0939)Science and Technological Research of Key Projects of Education Department of Henan Province of China(No.13A520071)
文摘Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure.
基金Projects(61203330,61104009,61075092)supported by the National Natural Science Foundation of ChinaProject(2013M540546)supported by China Postdoctoral Science Foundation+2 种基金Projects(ZR2012FM031,ZR2011FM011,ZR2010FM007)supported by Shandong Provincal Nature Science Foundation,ChinaProjects(2011JC017,2012TS078)supported by Independent Innovation Foundation of Shandong University,ChinaProject(201203058)supported by Shandong Provincal Postdoctoral Innovation Foundation,China
文摘The quick response code based artificial labels are applied to provide semantic concepts and relations of surroundings that permit the understanding of complexity and limitations of semantic recognition and scene only with robot's vision.By imitating spatial cognizing mechanism of human,the robot constantly received the information of artificial labels at cognitive-guide points in a wide range of structured environment to achieve the perception of the environment and robot navigation.The immune network algorithm was used to form the environmental awareness mechanism with "distributed representation".The color recognition and SIFT feature matching algorithm were fused to achieve the memory and cognition of scenario tag.Then the cognition-guide-action based cognizing semantic map was built.Along with the continuously abundant map,the robot did no longer need to rely on the artificial label,and it could plan path and navigate freely.Experimental results show that the artificial label designed in this work can improve the cognitive ability of the robot,navigate the robot in the case of semi-unknown environment,and build the cognizing semantic map favorably.
文摘Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dense depth maps of complex scenes.However,simultaneous se⁃mantic and spatial joint perception,so-called dense 3D semantic mapping,estimating the 3D ge⁃ometry of a scene and attaching semantic labels to the geometry,remains a challenging problem that,if solved,would make structured vision understanding and editing more widely accessible.Concurrently,progress in computer vision and machine learning has motivated us to pursue the capability of understanding and digitally reconstructing the surrounding world.Neural metric-se⁃mantic understanding is a new and rapidly emerging field that combines differentiable machine learning techniques with physical knowledge from computer vision,e.g.,the integration of visualinertial simultaneous localization and mapping(SLAM),mesh reconstruction,and semantic un⁃derstanding.In this paper,we attempt to summarize the recent trends and applications of neural metric-semantic understanding.Starting with an overview of the underlying computer vision and machine learning concepts,we discuss critical aspects of such perception approaches.Specifical⁃ly,our emphasis is on fully leveraging the joint semantic and 3D information.Later on,many im⁃portant applications of the perception capability such as novel view synthesis and semantic aug⁃mented reality(AR)contents manipulation are also presented.Finally,we conclude with a dis⁃cussion of the technical implications of the technology under a 5G edge computing scenario.
基金the National Natural Science Foundation of China(No.62063006)to the Natural Science Foundation of Guangxi Province(No.2023GXNS-FAA026025)+3 种基金to the Innovation Fund of Chinese Universities Industry-University-Research(ID:2021RYC06005)to the Research Project for Young and Middle-aged Teachers in Guangxi Universities(ID:2020KY15013)to the Special Research Project of Hechi University(ID:2021GCC028)supported by the Project of Outstanding Thousand Young Teachers’Training in Higher Education Institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region.
文摘Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development.
文摘As for language learners, vocabulary acquisition is one of the most important tasks, and the same goes for L2 teachers.Recently, some researchers claim that semantic mapping is far more effective in vocabulary. The present study comparatively investigate two strategies of teaching vocabulary to non-English lower-intermediate learners. One is the traditional grammatical description and drilling way, the other is the semantic mapping. Six new words were taught by using these two ways respectively,and a sample test was applied to check students' mastery of these words at the end of the class. The test results show that there is no significant difference between the learning outcomes of these two different teaching approaches, but learners turned out to be more engaged when the teacher conducted the semantic links teaching method. Therefore, perhaps, a combination of these two approaches would be a good choice for those L2 learners.
基金This work was supported in part by National Natural Science Foundation of China(U186420361773234 and 52102464)Project Funded by China Postdoctoral Science Foundation(2019M660622)in part by the International Science and Technology Cooperation Program of China(2019YFE0100200).
文摘High-definition map has become a vital cornerstone in the navigation of autonomous vehicles in complex traffic scenarios.Thus,the construction of high-definition maps has become crucial.Traditional methods relying on expensive mapping vehicles equipped with high-end sensor equipment are not suitable for mass map construction because of the limitation imposed by its high cost.Hence,this paper proposes a new method to create a high-definition road semantics map using multi-vehicle sensor data.The proposed method implements crowdsourced point-based visual SLAM to align and combine the local maps derived by multiple vehicles.This allows users to modify the extraction process by using a more sophisticated neural network,thus achieving a more accurate detection result when compared with traditional binarization method.The resulting map consists of road marking points suitable for autonomous vehicle navigation and path-planning tasks.Finally,the method is evaluated on the real-world KAIST urban dataset and Shougang dataset to demonstrate the level of detail and accuracy of the proposed map with 0.369 m in mapping errors in ideal condition.
基金supported by National Natural Science Foundation of China(Nos.NSFC 61473042 and 61105092)Beijing Higher Education Young Elite Teacher Project(No.YETP1215)
文摘In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and tile motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify tim feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate tile dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.