This paper deals with a binocular 3-D computer vision system based on the hierarchicalmatching of edge features, Frei and Chen operator is used to extract the edge. The averagegradients of an image obtained by two iso...This paper deals with a binocular 3-D computer vision system based on the hierarchicalmatching of edge features, Frei and Chen operator is used to extract the edge. The averagegradients of an image obtained by two isotropic operators are non-equal quantized andthresholded in an angle, Edge features are extracted after passing a preemphasis transferfunction which can equalize, the noise affection. Binary edge images are decomposed into apyramid structure which is stored and searched using llliffe’s location method. Corre-sponding points are used to determine the range data using triangulation based on an improvedTrivedi’s formula. In calibration the authors set the optical axes of the two cameras parallelto simplify the calculation, A 3 rd order Householder transform is used to solve the compati-ble coupled equations.展开更多
As the agricultural internet of things(IoT)technology has evolved,smart agricultural robots needs to have both flexibility and adaptability when moving in complex field environments.In this paper,we propose the concep...As the agricultural internet of things(IoT)technology has evolved,smart agricultural robots needs to have both flexibility and adaptability when moving in complex field environments.In this paper,we propose the concept of a vision-based navigation system for the agricultural IoT and a binocular vision navigation algorithm for smart agricultural robots,which can fuse the edge contour and the height information of rows of crop in images to extract the navigation parameters.First,the speeded-up robust feature(SURF)extracting and matching algorithm is used to obtain featuring point pairs from the green crop row images observed by the binocular parallel vision system.Then the confidence density image is constructed by integrating the enhanced elevation image and the corresponding binarized crop row image,where the edge contour and the height information of crop row are fused to extract the navigation parameters(θ,d)based on the model of a smart agricultural robot.Finally,the five navigation network instruction sets are designed based on the navigation angleθand the lateral distance d,which represent the basic movements for a certain type of smart agricultural robot working in a field.Simulated experimental results in the laboratory show that the algorithm proposed in this study is effective with small turning errors and low standard deviations,and can provide a valuable reference for the further practical application of binocular vision navigation systems in smart agricultural robots in the agricultural IoT system.展开更多
In this paper,we present a robot vision based system for coordinate measurement of feature points on large scale automobile parts.Our system consists of an industrial 6-DOF robot mounted with a CCD camera and a PC.The...In this paper,we present a robot vision based system for coordinate measurement of feature points on large scale automobile parts.Our system consists of an industrial 6-DOF robot mounted with a CCD camera and a PC.The system controls the robot into the area of feature points.The images of measuring feature points are acquired by the camera mounted on the robot.3D positions of the feature points are obtained from a model based pose estimation that applies to the images.The measured positions of all feature points are then transformed to the reference coordinate of feature points whose positions are obtained from the coordinate measuring machine(CMM).Finally,the point-to-point distances between the measured feature points and the reference feature points are calculated and reported.The results show that the root mean square error(RMSE) of measure values obtained by our system is less than 0.5 mm.Our system is adequate for automobile assembly and can perform faster than conventional methods.展开更多
3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost a...3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost all based on ImageNet.Although the pretraining results of ImageNet are quite impressive,there is still a significant discrepancy between multi-view datasets and ImageNet.Multi-view datasets naturally retain rich 3D information.In addition,large-scale datasets such as ImageNet require considerable cleaning and annotation work,so it is difficult to regenerate a second dataset.In contrast,unsupervised learning methods can learn general feature representations without any extra annotation.To this end,we propose a three-stage unsupervised joint pretraining model.Specifically,we decouple the final representations into three fine-grained representations.Data augmentation is utilized to obtain pixel-level representations within each view.And we boost the spatial invariant features from the view level.Finally,we exploit global information at the shape level through a novel extract-and-swap module.Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks,and shows generalization to cross-dataset tasks.展开更多
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulner...Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.展开更多
文摘This paper deals with a binocular 3-D computer vision system based on the hierarchicalmatching of edge features, Frei and Chen operator is used to extract the edge. The averagegradients of an image obtained by two isotropic operators are non-equal quantized andthresholded in an angle, Edge features are extracted after passing a preemphasis transferfunction which can equalize, the noise affection. Binary edge images are decomposed into apyramid structure which is stored and searched using llliffe’s location method. Corre-sponding points are used to determine the range data using triangulation based on an improvedTrivedi’s formula. In calibration the authors set the optical axes of the two cameras parallelto simplify the calculation, A 3 rd order Householder transform is used to solve the compati-ble coupled equations.
基金the National Natural Science Foundationof China(No.31760345).
文摘As the agricultural internet of things(IoT)technology has evolved,smart agricultural robots needs to have both flexibility and adaptability when moving in complex field environments.In this paper,we propose the concept of a vision-based navigation system for the agricultural IoT and a binocular vision navigation algorithm for smart agricultural robots,which can fuse the edge contour and the height information of rows of crop in images to extract the navigation parameters.First,the speeded-up robust feature(SURF)extracting and matching algorithm is used to obtain featuring point pairs from the green crop row images observed by the binocular parallel vision system.Then the confidence density image is constructed by integrating the enhanced elevation image and the corresponding binarized crop row image,where the edge contour and the height information of crop row are fused to extract the navigation parameters(θ,d)based on the model of a smart agricultural robot.Finally,the five navigation network instruction sets are designed based on the navigation angleθand the lateral distance d,which represent the basic movements for a certain type of smart agricultural robot working in a field.Simulated experimental results in the laboratory show that the algorithm proposed in this study is effective with small turning errors and low standard deviations,and can provide a valuable reference for the further practical application of binocular vision navigation systems in smart agricultural robots in the agricultural IoT system.
基金wsupported by the Thailand Research Fund and Solimac Automation Co.,Ltd.under the Research and Researchers for Industry Program(RRI)under Grant No.MSD56I0098Office of the Higher Education Commission under the National Research University Project of Thailand
文摘In this paper,we present a robot vision based system for coordinate measurement of feature points on large scale automobile parts.Our system consists of an industrial 6-DOF robot mounted with a CCD camera and a PC.The system controls the robot into the area of feature points.The images of measuring feature points are acquired by the camera mounted on the robot.3D positions of the feature points are obtained from a model based pose estimation that applies to the images.The measured positions of all feature points are then transformed to the reference coordinate of feature points whose positions are obtained from the coordinate measuring machine(CMM).Finally,the point-to-point distances between the measured feature points and the reference feature points are calculated and reported.The results show that the root mean square error(RMSE) of measure values obtained by our system is less than 0.5 mm.Our system is adequate for automobile assembly and can perform faster than conventional methods.
基金This work was supported in part by National Natural Science Foundation of China(No.61976095)the Science and Technology Planning Project of Guangdong Province,China(No.2018B030323026).
文摘3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost all based on ImageNet.Although the pretraining results of ImageNet are quite impressive,there is still a significant discrepancy between multi-view datasets and ImageNet.Multi-view datasets naturally retain rich 3D information.In addition,large-scale datasets such as ImageNet require considerable cleaning and annotation work,so it is difficult to regenerate a second dataset.In contrast,unsupervised learning methods can learn general feature representations without any extra annotation.To this end,we propose a three-stage unsupervised joint pretraining model.Specifically,we decouple the final representations into three fine-grained representations.Data augmentation is utilized to obtain pixel-level representations within each view.And we boost the spatial invariant features from the view level.Finally,we exploit global information at the shape level through a novel extract-and-swap module.Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks,and shows generalization to cross-dataset tasks.
文摘Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.