提出一种融合RGBD(red,green,blue and depth)信息和明暗恢复形状(SFS)的三维(3D)数字化方法。首先在室内自然光环境下使用Kinect深度相机获取被测物体表面的彩色图和深度图,其次对采集到的深度图进行双边滤波处理,并估计处理后深度图...提出一种融合RGBD(red,green,blue and depth)信息和明暗恢复形状(SFS)的三维(3D)数字化方法。首先在室内自然光环境下使用Kinect深度相机获取被测物体表面的彩色图和深度图,其次对采集到的深度图进行双边滤波处理,并估计处理后深度图的表面法向量,然后建立光照模型并对其求解,最后构建全局代价函数,以完成对目标物体深度信息的优化求解。实验结果表明:该方法可以在室内自然光环境下较好地恢复出重建物体的细节信息。展开更多
The autonomous exploration and mapping of an unknown environment is useful in a wide range of applications and thus holds great significance. Existing methods mostly use range sensors to generate twodimensional (2D) g...The autonomous exploration and mapping of an unknown environment is useful in a wide range of applications and thus holds great significance. Existing methods mostly use range sensors to generate twodimensional (2D) grid maps. Red/green/blue-depth (RGB-D) sensors provide both color and depth information on the environment, thereby enabling the generation of a three-dimensional (3D) point cloud map that is intuitive for human perception. In this paper, we present a systematic approach with dual RGB-D sensors to achieve the autonomous exploration and mapping of an unknown indoor environment. With the synchronized and processed RGB-D data, location points were generated and a 3D point cloud map and 2D grid map were incrementally built. Next, the exploration was modeled as a partially observable Markov decision process. Partial map simulation and global frontier search methods were combined for autonomous exploration, and dynamic action constraints were utilized in motion control. In this way, the local optimum can be avoided and the exploration efficacy can be ensured. Experiments with single connected and multi-branched regions demonstrated the high robustness, efficiency, and superiority of the developed system and methods.展开更多
基于表观的视线估计方法主要是在二维的三原色(red green blue,RGB)图像上进行,当头部在自由运动时视线估计精度较低,且目前基于卷积神经网络的表观视线估计都普遍使用池化来增大特征图中像素点的感受野,导致了特征图的信息损失,提出一...基于表观的视线估计方法主要是在二维的三原色(red green blue,RGB)图像上进行,当头部在自由运动时视线估计精度较低,且目前基于卷积神经网络的表观视线估计都普遍使用池化来增大特征图中像素点的感受野,导致了特征图的信息损失,提出一种基于膨胀卷积神经网络的多模态融合视线估计模型。在该模型中,利用膨胀卷积设计了一种叫GENet(gaze estimation network)的网络提取眼睛的RGB和深度图像的特征图,并利用卷积神经网络的全连接层自动融合头部姿态和2种图像的特征图,从而进行视线估计。实验部分在公开数据集Eyediap上验证了设计的模型,并将设计的模型同其他视线估计模型进行比较。实验结果表明,提出的视线估计模型可以在自由的头部运动下准确地估计视线方向。展开更多
针对视觉SLAM(Simultaneous Localization and Mapping)在真实场景下出现动态物体(如行人,车辆、动物)等影响算法定位和建图精确性的问题,基于ORB-SLAM3(Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping 3)提出...针对视觉SLAM(Simultaneous Localization and Mapping)在真实场景下出现动态物体(如行人,车辆、动物)等影响算法定位和建图精确性的问题,基于ORB-SLAM3(Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping 3)提出了YOLOv3-ORB-SLAM3算法。该算法在ORB-SLAM3的基础上增加了语义线程,采用动态和静态场景特征提取双线程机制:语义线程使用YOLOv3对场景中动态物体进行语义识别目标检测,同时对提取的动态区域特征点进行离群点剔除;跟踪线程通过ORB特征提取场景区域特征,结合语义信息获得静态场景特征送入后端,从而消除动态场景对系统的干扰,提升视觉SLAM算法定位精度。利用TUM(Technical University of Munich)数据集验证,结果表明YOLOv3-ORB-SLAM3算法在单目模式下动态序列相比ORB-SLAM3算法ATE(Average Treatment Effect)指标下降30%左右,RGB-D(Red,Green and Blue-Depth)模式下动态序列ATE指标下降10%,静态序列未有明显下降。展开更多
Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments.B...Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments.But,most of the available methods for performing such cluttered tasks failed in terms of performance,mainly due to inability to adapt to the change of the environment and the handled objects.Here,we propose a new,near real-time approach to suction-based grasp point estimation in a highly cluttered environment by employing an affordance-based approach.Compared to the state-of-the-art,our proposed method offers two distinctive contributions.First,we use a modified deep neural network backbone for the input of the semantic segmentation,to classify pixel elements of the input red,green,blue and depth(RGBD)channel image which is then used to produce an affordance map,a pixel-wise probability map representing the probability of a successful grasping action in those particular pixel regions.Later,we incorporate a high speed semantic segmentation to the system,which makes our solution have a lower computational time.This approach does not need to have any prior knowledge or models of the objects since it removes the step of pose estimation and object recognition entirely compared to most of the current approaches and uses an assumption to grasp first then recognize later,which makes it possible to have an object-agnostic property.The system was designed to be used for household objects,but it can be easily extended to any kind of objects provided that the right dataset is used for training the models.Experimental results show the benefit of our approach which achieves a precision of 88.83%,compared to the 83.4%precision of the current state-of-the-art.展开更多
文摘提出一种融合RGBD(red,green,blue and depth)信息和明暗恢复形状(SFS)的三维(3D)数字化方法。首先在室内自然光环境下使用Kinect深度相机获取被测物体表面的彩色图和深度图,其次对采集到的深度图进行双边滤波处理,并估计处理后深度图的表面法向量,然后建立光照模型并对其求解,最后构建全局代价函数,以完成对目标物体深度信息的优化求解。实验结果表明:该方法可以在室内自然光环境下较好地恢复出重建物体的细节信息。
基金the National Natural Science Foundation of China (61720106012 and 61403215)the Foundation of State Key Laboratory of Robotics (2006-003)the Fundamental Research Funds for the Central Universities for the financial support of this work.
文摘The autonomous exploration and mapping of an unknown environment is useful in a wide range of applications and thus holds great significance. Existing methods mostly use range sensors to generate twodimensional (2D) grid maps. Red/green/blue-depth (RGB-D) sensors provide both color and depth information on the environment, thereby enabling the generation of a three-dimensional (3D) point cloud map that is intuitive for human perception. In this paper, we present a systematic approach with dual RGB-D sensors to achieve the autonomous exploration and mapping of an unknown indoor environment. With the synchronized and processed RGB-D data, location points were generated and a 3D point cloud map and 2D grid map were incrementally built. Next, the exploration was modeled as a partially observable Markov decision process. Partial map simulation and global frontier search methods were combined for autonomous exploration, and dynamic action constraints were utilized in motion control. In this way, the local optimum can be avoided and the exploration efficacy can be ensured. Experiments with single connected and multi-branched regions demonstrated the high robustness, efficiency, and superiority of the developed system and methods.
文摘基于表观的视线估计方法主要是在二维的三原色(red green blue,RGB)图像上进行,当头部在自由运动时视线估计精度较低,且目前基于卷积神经网络的表观视线估计都普遍使用池化来增大特征图中像素点的感受野,导致了特征图的信息损失,提出一种基于膨胀卷积神经网络的多模态融合视线估计模型。在该模型中,利用膨胀卷积设计了一种叫GENet(gaze estimation network)的网络提取眼睛的RGB和深度图像的特征图,并利用卷积神经网络的全连接层自动融合头部姿态和2种图像的特征图,从而进行视线估计。实验部分在公开数据集Eyediap上验证了设计的模型,并将设计的模型同其他视线估计模型进行比较。实验结果表明,提出的视线估计模型可以在自由的头部运动下准确地估计视线方向。
文摘针对视觉SLAM(Simultaneous Localization and Mapping)在真实场景下出现动态物体(如行人,车辆、动物)等影响算法定位和建图精确性的问题,基于ORB-SLAM3(Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping 3)提出了YOLOv3-ORB-SLAM3算法。该算法在ORB-SLAM3的基础上增加了语义线程,采用动态和静态场景特征提取双线程机制:语义线程使用YOLOv3对场景中动态物体进行语义识别目标检测,同时对提取的动态区域特征点进行离群点剔除;跟踪线程通过ORB特征提取场景区域特征,结合语义信息获得静态场景特征送入后端,从而消除动态场景对系统的干扰,提升视觉SLAM算法定位精度。利用TUM(Technical University of Munich)数据集验证,结果表明YOLOv3-ORB-SLAM3算法在单目模式下动态序列相比ORB-SLAM3算法ATE(Average Treatment Effect)指标下降30%左右,RGB-D(Red,Green and Blue-Depth)模式下动态序列ATE指标下降10%,静态序列未有明显下降。
文摘Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments.But,most of the available methods for performing such cluttered tasks failed in terms of performance,mainly due to inability to adapt to the change of the environment and the handled objects.Here,we propose a new,near real-time approach to suction-based grasp point estimation in a highly cluttered environment by employing an affordance-based approach.Compared to the state-of-the-art,our proposed method offers two distinctive contributions.First,we use a modified deep neural network backbone for the input of the semantic segmentation,to classify pixel elements of the input red,green,blue and depth(RGBD)channel image which is then used to produce an affordance map,a pixel-wise probability map representing the probability of a successful grasping action in those particular pixel regions.Later,we incorporate a high speed semantic segmentation to the system,which makes our solution have a lower computational time.This approach does not need to have any prior knowledge or models of the objects since it removes the step of pose estimation and object recognition entirely compared to most of the current approaches and uses an assumption to grasp first then recognize later,which makes it possible to have an object-agnostic property.The system was designed to be used for household objects,but it can be easily extended to any kind of objects provided that the right dataset is used for training the models.Experimental results show the benefit of our approach which achieves a precision of 88.83%,compared to the 83.4%precision of the current state-of-the-art.