RGB-D camera is a new type of sensor,which can obtain the depth and texture information in an unknown 3D scene simultaneously,and they have been applied in various fields widely.In fact,when implementing such kinds of...RGB-D camera is a new type of sensor,which can obtain the depth and texture information in an unknown 3D scene simultaneously,and they have been applied in various fields widely.In fact,when implementing such kinds of applications using RGB-D camera,it is necessary to calibrate it first.To the best of our knowledge,at present,there is no existing a systemic summary related to RGB-D camera calibration methods.Therefore,a systemic review of RGB-D camera calibration is concluded as follows.Firstly,the mechanism of obtained measurement and the related principle of RGB-D camera calibration methods are presented.Subsequently,as some specific applications need to fuse depth and color information,the calibration methods of relative pose between depth camera and RGB camera are introduced in Section 2.Then the depth correction models within RGB-D cameras are summarized and compared respectively in Section 3.Thirdly,considering that the angle of the view field of RGB-D camera is smaller and limited to some specific applications,we discuss the calibration models of relative pose among multiple RGB-D cameras in Section 4.At last,the direction and trend of RGB-D camera calibration are prospected and concluded.展开更多
Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this ...Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this paper, we demonstrate that multiple unsynchronized low-cost RGB-D cameras can be used for the same purpose. To alleviate issues caused by unsynchronized shutters, we propose a novel depth frame interpolation technique that allows synchronized data capture from highly dynamic 3 D scenes. To manage the resulting huge number of input depth images, we also introduce an efficient moving least squares-based volumetric reconstruction method that generates triangle meshes of the scene. Our approach does not store the reconstruction volume in memory,making it memory-efficient and scalable to large scenes.Our implementation is completely GPU based and works in real time. The results shown herein, obtained with real data, demonstrate the effectiveness of our proposed method and its advantages compared to stateof-the-art approaches.展开更多
This paper proposes a depth measurement error model for consumer depth cameras such as the Microsoft Kinect, and a corresponding calibration method. These devices were originally designed as video game interfaces, and...This paper proposes a depth measurement error model for consumer depth cameras such as the Microsoft Kinect, and a corresponding calibration method. These devices were originally designed as video game interfaces, and their output depth maps usually lack sufficient accuracy for 3 D measurement.Models have been proposed to reduce these depth errors, but they only consider camera-related causes.Since the depth sensors are based on projectorcamera systems, we should also consider projectorrelated causes. Also, previous models require disparity observations, which are usually not output by such sensors, so cannot be employed in practice. We give an alternative error model for projector-camera based consumer depth cameras, based on their depth measurement algorithm, and intrinsic parameters of the camera and the projector; it does not need disparity values. We also give a corresponding new parameter estimation method which simply needs observation of a planar board. Our calibrated error model allows use of a consumer depth sensor as a 3 D measuring device.Experimental results show the validity and effectiveness of the error model and calibration procedure.展开更多
Modern consumer-grade RGB-D cameras provide intensive depth estimation and high frame rates.They have been widely used in agriculture.However,depth anamorphose occurswhen using the RGB-D camera.In order to address thi...Modern consumer-grade RGB-D cameras provide intensive depth estimation and high frame rates.They have been widely used in agriculture.However,depth anamorphose occurswhen using the RGB-D camera.In order to address this issue,this paper proposes a novel approach to correct the distorted depth images that fit the relationship between the true distances and the depth values from depth images.This study considers the structured light camera ASUS Xtion PRO LIVE as example to develop a system for obtaining a series of depth images at different distances from the target plane.A comparison analysis is conducted between the images before and after correction to evaluate the performance of the distortion correction.The point cloud image of corrected maize leaves is well coincident with the original plant.This approach improves the accuracy of depth measurement and optimizes the subsequent use of the depth camera in crop reconstruction and phenotyping studies.展开更多
In the narrow, submarine, unstructured environment, the present localization approaches, such as GPS measurement, dead?rcckoning, acoustic positioning, artificial landmarks-based method, are hard to be used for multip...In the narrow, submarine, unstructured environment, the present localization approaches, such as GPS measurement, dead?rcckoning, acoustic positioning, artificial landmarks-based method, are hard to be used for multiple small-scale underwater robots. Therefore, this paper proposes a novel RGB-D camera and Inertial Measurement Unit (IMU) fusion-based cooperative and relative close-range localization approach for special environments, such as underwater caves. Owing to the rotation movement with zero-radius, the cooperative localization of Multiple Turtle-inspired Amphibious Spherical Robot (MTASRs) is realized. Firstly, we present an efficient Histogram of Oriented Gradient (HOG) and Color Names (CNs) fusion feature extracted from color images ofTASRs. Then, by training Support Vector Machine (SVM) classifier with this fusion feature, an automatic recognition method of TASRs is developed. Secondly, RGB-D camerabased measurement model is obtained by the depth map In order to realize the cooperative and relative close-range localization of MTASRs, the MTASRs model is established with RGB-D camera and IMU. Finally, the depth measurement in water is corrected and the efficiency of RGB-D camera for underwater application is validated. Then experiments of our proposed localization method with three robots were conducted and the results verified the feasibility of the proposed method for MTASRs.展开更多
With the continuous development of the economy and society,plastic pollution in rivers,lakes,oceans,and other bodies of water is increasingly severe,posing a serious challenge to underwater ecosystems.Effective cleani...With the continuous development of the economy and society,plastic pollution in rivers,lakes,oceans,and other bodies of water is increasingly severe,posing a serious challenge to underwater ecosystems.Effective cleaning up of underwater litter by robots relies on accurately identifying and locating the plastic waste.However,it often causes significant challenges such as noise interference,low contrast,and blurred textures in underwater optical images.A weighted fusion-based algorithm for enhancing the quality of underwater images is proposed,which combines weighted logarithmic transformations,adaptive gamma correction,improved multi-scale Retinex(MSR)algorithm,and the contrast limited adaptive histogram equalization(CLAHE)algorithm.The proposed algorithm improves brightness,contrast,and color recovery and enhances detail features resulting in better overall image quality.A network framework is proposed in this article based on the YOLOv5 model.MobileViT is used as the backbone of the network framework,detection layer is added to improve the detection capability for small targets,self-attention and mixed-attention modules are introduced to enhance the recognition capability of important features.The cross stage partial(CSP)structure is employed in the spatial pyramid pooling(SPP)section to enrich feature information,and the complete intersection over union(CIOU)loss is replaced with the focal efficient intersection over union(EIOU)loss to accelerate convergence while improving regression accuracy.Experimental results proved that the target recognition algorithm achieved a recognition accuracy of 0.913 and ensured a recognition speed of 45.56 fps/s.Subsequently,Using red,green,blue and depth(RGB-D)camera to construct a system for identifying and locating underwater plastic waste.Experiments were conducted underwater for recognition,localization,and error analysis.The experimental results demonstrate the effectiveness of the proposed method for identifying and locating underwater plastic waste,and it has good localization accuracy.展开更多
Background Large screen visualization sys tems have been widely utilized in many industries.Such systems can help illustrate the working states of different production systems.However,efficient interaction with such s...Background Large screen visualization sys tems have been widely utilized in many industries.Such systems can help illustrate the working states of different production systems.However,efficient interaction with such systems is still a focus of related research.Methods In this paper,we propose a touchless interaction system based on RGB-D camera using a novel bone-length constraining method.The proposed method optimizes the joint data collected from RGB-D cameras with more accurate and more stable results on very noisy data.The user can customize the system by modifying the finite-state machine in the system and reuse the gestures in multiple scenarios,reducing the number of gestures that need to be designed and memorized.Results/Conclusions The authors tested the system in two cases.In the first case,we illustrated a process in which we improved the gesture designs on our system and tested the system through user study.In the second case,we utilized the system in the mining industry and conducted a user study,where users say that they think the system is easy to use.展开更多
The field of vision-based human hand three-dimensional(3D)shape and pose estimation has attracted significant attention recently owing to its key role in various applications,such as natural human computer interaction...The field of vision-based human hand three-dimensional(3D)shape and pose estimation has attracted significant attention recently owing to its key role in various applications,such as natural human computer interactions.With the availability of large-scale annotated hand datasets and the rapid developments of deep neural networks(DNNs),numerous DNN-based data-driven methods have been proposed for accurate and rapid hand shape and pose estimation.Nonetheless,the existence of complicated hand articulation,depth and scale ambiguities,occlusions,and finger similarity remain challenging.In this study,we present a comprehensive survey of state-of-the-art 3D hand shape and pose estimation approaches using RGB-D cameras.Related RGB-D cameras,hand datasets,and a performance analysis are also discussed to provide a holistic view of recent achievements.We also discuss the research potential of this rapidly growing field.展开更多
Object recognition and location has always been one of the research hotspots in machine vision.It is of great value and significance to the development and application of current service robots,industrial automation,u...Object recognition and location has always been one of the research hotspots in machine vision.It is of great value and significance to the development and application of current service robots,industrial automation,unmanned driving and other fields.In order to realize the real-time recognition and location of indoor scene objects,this article proposes an improved YOLOv3 neural network model,which combines densely connected networks and residual networks to construct a new YOLOv3 backbone network,which is applied to the detection and recognition of objects in indoor scenes.In this article,RealSense D415 RGB-D camera is used to obtain the RGB map and depth map,the actual distance value is calculated after each pixel in the scene image is mapped to the real scene.Experiment results proved that the detection and recognition accuracy and real-time performance by the new network are obviously improved compared with the previous YOLOV3 neural network model in the same scene.More objects can be detected after the improvement of network which cannot be detected with the YOLOv3 network before the improvement.The running time of objects detection and recognition is reduced to less than half of the original.This improved network has a certain reference value for practical engineering application.展开更多
Accurate vehicle localization is a key technology for autonomous driving tasks in indoor parking lots,such as automated valet parking.Additionally,infrastructure-based cooperative driving systems have become a means t...Accurate vehicle localization is a key technology for autonomous driving tasks in indoor parking lots,such as automated valet parking.Additionally,infrastructure-based cooperative driving systems have become a means to realizing intelligent driving.In this paper,we propose a novel and practical vehicle localization system using infrastructure-based RGB-D cameras for indoor parking lots.In the proposed system,we design a depth data preprocessing method with both simplicity and efficiency to reduce the computational burden resulting from a large amount of data.Meanwhile,the hardware synchronization for all cameras in the sensor network is not implemented owing to the disadvantage that it is extremely cumbersome and would significantly reduce the scalability of our system in mass deployments.Hence,to address the problem of data distortion accompanying vehicle motion,we propose a vehicle localization method by performing template point cloud registration in distributed depth data.Finally,a complete hardware system was built to verify the feasibility of our solution in a real-world environment.Experiments in an indoor parking lot demonstrated the effectiveness and accuracy of the proposed vehicle localization system,with a maximum root mean squared error of 5 cm at 15Hz compared with the ground truth.展开更多
Recognition of the boundaries of farmland plow areas has an important guiding role in the operation of intelligent agricultural equipment.To precisely recognize these boundaries,a detection method for unmanned tractor...Recognition of the boundaries of farmland plow areas has an important guiding role in the operation of intelligent agricultural equipment.To precisely recognize these boundaries,a detection method for unmanned tractor plow areas based on RGB-Depth(RGB-D)cameras was proposed,and the feasibility of the detection method was analyzed.This method applied advanced computer vision technology to the field of agricultural automation.Adopting and improving the YOLOv5-seg object segmentation algorithm,first,the Convolutional Block Attention Module(CBAM)was integrated into Concentrated-Comprehensive Convolution Block(C3)to form C3CBAM,thereby enhancing the ability of the network to extract features from plow areas.The GhostConv module was also utilized to reduce parameter and computational complexity.Second,using the depth image information provided by the RGB-D camera combined with the results recognized by the YOLOv5-seg model,the mask image was processed to extract contour boundaries,align the contours with the depth map,and obtain the boundary distance information of the plowed area.Last,based on farmland information,the calculated average boundary distance was corrected,further improving the accuracy of the distance measurements.The experiment results showed that the YOLOv5-seg object segmentation algorithm achieved a recognition accuracy of 99%for plowed areas and that the ranging accuracy improved with decreasing detection distance.The ranging error at 5.5 m was approximately 0.056 m,and the average detection time per frame is 29 ms,which can meet the real-time operational requirements.The results of this study can provide precise guarantees for the autonomous operation of unmanned plowing units.展开更多
3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras,there is a growing interest in digitizing real-world indoor 3D scenes...3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras,there is a growing interest in digitizing real-world indoor 3D scenes. However,modeling indoor3 D scenes remains a challenging problem because of the complex structure of interior objects and poor quality of RGB-D data acquired by consumer-level sensors.Various methods have been proposed to tackle these challenges. In this survey,we provide an overview of recent advances in indoor scene modeling techniques,as well as public datasets and code libraries which can facilitate experiments and evaluation.展开更多
基金National Natural Science Foundation of China(41801379)。
文摘RGB-D camera is a new type of sensor,which can obtain the depth and texture information in an unknown 3D scene simultaneously,and they have been applied in various fields widely.In fact,when implementing such kinds of applications using RGB-D camera,it is necessary to calibrate it first.To the best of our knowledge,at present,there is no existing a systemic summary related to RGB-D camera calibration methods.Therefore,a systemic review of RGB-D camera calibration is concluded as follows.Firstly,the mechanism of obtained measurement and the related principle of RGB-D camera calibration methods are presented.Subsequently,as some specific applications need to fuse depth and color information,the calibration methods of relative pose between depth camera and RGB camera are introduced in Section 2.Then the depth correction models within RGB-D cameras are summarized and compared respectively in Section 3.Thirdly,considering that the angle of the view field of RGB-D camera is smaller and limited to some specific applications,we discuss the calibration models of relative pose among multiple RGB-D cameras in Section 4.At last,the direction and trend of RGB-D camera calibration are prospected and concluded.
文摘Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this paper, we demonstrate that multiple unsynchronized low-cost RGB-D cameras can be used for the same purpose. To alleviate issues caused by unsynchronized shutters, we propose a novel depth frame interpolation technique that allows synchronized data capture from highly dynamic 3 D scenes. To manage the resulting huge number of input depth images, we also introduce an efficient moving least squares-based volumetric reconstruction method that generates triangle meshes of the scene. Our approach does not store the reconstruction volume in memory,making it memory-efficient and scalable to large scenes.Our implementation is completely GPU based and works in real time. The results shown herein, obtained with real data, demonstrate the effectiveness of our proposed method and its advantages compared to stateof-the-art approaches.
基金supported by the JST CREST“Behavior Understanding based on Intention-Gait Model”project
文摘This paper proposes a depth measurement error model for consumer depth cameras such as the Microsoft Kinect, and a corresponding calibration method. These devices were originally designed as video game interfaces, and their output depth maps usually lack sufficient accuracy for 3 D measurement.Models have been proposed to reduce these depth errors, but they only consider camera-related causes.Since the depth sensors are based on projectorcamera systems, we should also consider projectorrelated causes. Also, previous models require disparity observations, which are usually not output by such sensors, so cannot be employed in practice. We give an alternative error model for projector-camera based consumer depth cameras, based on their depth measurement algorithm, and intrinsic parameters of the camera and the projector; it does not need disparity values. We also give a corresponding new parameter estimation method which simply needs observation of a planar board. Our calibrated error model allows use of a consumer depth sensor as a 3 D measuring device.Experimental results show the validity and effectiveness of the error model and calibration procedure.
基金supported by the National Natural Science Foundation of China[grant number 31871527].
文摘Modern consumer-grade RGB-D cameras provide intensive depth estimation and high frame rates.They have been widely used in agriculture.However,depth anamorphose occurswhen using the RGB-D camera.In order to address this issue,this paper proposes a novel approach to correct the distorted depth images that fit the relationship between the true distances and the depth values from depth images.This study considers the structured light camera ASUS Xtion PRO LIVE as example to develop a system for obtaining a series of depth images at different distances from the target plane.A comparison analysis is conducted between the images before and after correction to evaluate the performance of the distortion correction.The point cloud image of corrected maize leaves is well coincident with the original plant.This approach improves the accuracy of depth measurement and optimizes the subsequent use of the depth camera in crop reconstruction and phenotyping studies.
基金国家自然科学基金(31500355,31770468)资助上海市科技创新行动计划(19DZ1203801)资助+2 种基金中国长江三峡集团科技基金(20203138)资助supported by Grant No.CZ.02.1.01/0.0/0.0/16_019/0000803 financed by OP RDEScientific Grant Agency(VEGA)of the Ministry of Education,Science,Research and Sport of the Slovak Republic and the Slovak Academy of Sciences(grant number 1/0335/20)
基金the National Natural Science Foundation of China (Nos. 61773064, 61503028)Graduate Technological Innovation Project of Beijing Institute of Technology (No. 2018CX10022)National High Tech. Research and Development Program of China (No. 2015AA043202).
文摘In the narrow, submarine, unstructured environment, the present localization approaches, such as GPS measurement, dead?rcckoning, acoustic positioning, artificial landmarks-based method, are hard to be used for multiple small-scale underwater robots. Therefore, this paper proposes a novel RGB-D camera and Inertial Measurement Unit (IMU) fusion-based cooperative and relative close-range localization approach for special environments, such as underwater caves. Owing to the rotation movement with zero-radius, the cooperative localization of Multiple Turtle-inspired Amphibious Spherical Robot (MTASRs) is realized. Firstly, we present an efficient Histogram of Oriented Gradient (HOG) and Color Names (CNs) fusion feature extracted from color images ofTASRs. Then, by training Support Vector Machine (SVM) classifier with this fusion feature, an automatic recognition method of TASRs is developed. Secondly, RGB-D camerabased measurement model is obtained by the depth map In order to realize the cooperative and relative close-range localization of MTASRs, the MTASRs model is established with RGB-D camera and IMU. Finally, the depth measurement in water is corrected and the efficiency of RGB-D camera for underwater application is validated. Then experiments of our proposed localization method with three robots were conducted and the results verified the feasibility of the proposed method for MTASRs.
基金supported by the Foundation of Henan Key Laboratory of Underwater Intelligent Equipment under Grant No.KL02C2105Project of SongShan Laboratory under Grant No.YYJC062022012+2 种基金Training Plan for Young Backbone Teachers in Colleges and Universities in Henan Province under Grant No.2021GGJS077Key Scientific Research Projects of Colleges and Universities in Henan Province under Grant No.22A460022North China University of Water Resources and Electric Power Young Backbone Teacher Training Project under Grant No.2021-125-4.
文摘With the continuous development of the economy and society,plastic pollution in rivers,lakes,oceans,and other bodies of water is increasingly severe,posing a serious challenge to underwater ecosystems.Effective cleaning up of underwater litter by robots relies on accurately identifying and locating the plastic waste.However,it often causes significant challenges such as noise interference,low contrast,and blurred textures in underwater optical images.A weighted fusion-based algorithm for enhancing the quality of underwater images is proposed,which combines weighted logarithmic transformations,adaptive gamma correction,improved multi-scale Retinex(MSR)algorithm,and the contrast limited adaptive histogram equalization(CLAHE)algorithm.The proposed algorithm improves brightness,contrast,and color recovery and enhances detail features resulting in better overall image quality.A network framework is proposed in this article based on the YOLOv5 model.MobileViT is used as the backbone of the network framework,detection layer is added to improve the detection capability for small targets,self-attention and mixed-attention modules are introduced to enhance the recognition capability of important features.The cross stage partial(CSP)structure is employed in the spatial pyramid pooling(SPP)section to enrich feature information,and the complete intersection over union(CIOU)loss is replaced with the focal efficient intersection over union(EIOU)loss to accelerate convergence while improving regression accuracy.Experimental results proved that the target recognition algorithm achieved a recognition accuracy of 0.913 and ensured a recognition speed of 45.56 fps/s.Subsequently,Using red,green,blue and depth(RGB-D)camera to construct a system for identifying and locating underwater plastic waste.Experiments were conducted underwater for recognition,localization,and error analysis.The experimental results demonstrate the effectiveness of the proposed method for identifying and locating underwater plastic waste,and it has good localization accuracy.
基金the National Key Research and Development Project of China(2017 YFC 0804401)National Natural Science Foundation of China(U 1909204).
文摘Background Large screen visualization sys tems have been widely utilized in many industries.Such systems can help illustrate the working states of different production systems.However,efficient interaction with such systems is still a focus of related research.Methods In this paper,we propose a touchless interaction system based on RGB-D camera using a novel bone-length constraining method.The proposed method optimizes the joint data collected from RGB-D cameras with more accurate and more stable results on very noisy data.The user can customize the system by modifying the finite-state machine in the system and reuse the gestures in multiple scenarios,reducing the number of gestures that need to be designed and memorized.Results/Conclusions The authors tested the system in two cases.In the first case,we illustrated a process in which we improved the gesture designs on our system and tested the system through user study.In the second case,we utilized the system in the mining industry and conducted a user study,where users say that they think the system is easy to use.
基金the National Key R&D Program of China(2018YFB1004600)the National Natural Science Foundation of China(61502187,61876211)the National Science Foundation Grant CNS(1951952).
文摘The field of vision-based human hand three-dimensional(3D)shape and pose estimation has attracted significant attention recently owing to its key role in various applications,such as natural human computer interactions.With the availability of large-scale annotated hand datasets and the rapid developments of deep neural networks(DNNs),numerous DNN-based data-driven methods have been proposed for accurate and rapid hand shape and pose estimation.Nonetheless,the existence of complicated hand articulation,depth and scale ambiguities,occlusions,and finger similarity remain challenging.In this study,we present a comprehensive survey of state-of-the-art 3D hand shape and pose estimation approaches using RGB-D cameras.Related RGB-D cameras,hand datasets,and a performance analysis are also discussed to provide a holistic view of recent achievements.We also discuss the research potential of this rapidly growing field.
基金supported by Henan Province Science and Technology Project under Grant No.182102210065.
文摘Object recognition and location has always been one of the research hotspots in machine vision.It is of great value and significance to the development and application of current service robots,industrial automation,unmanned driving and other fields.In order to realize the real-time recognition and location of indoor scene objects,this article proposes an improved YOLOv3 neural network model,which combines densely connected networks and residual networks to construct a new YOLOv3 backbone network,which is applied to the detection and recognition of objects in indoor scenes.In this article,RealSense D415 RGB-D camera is used to obtain the RGB map and depth map,the actual distance value is calculated after each pixel in the scene image is mapped to the real scene.Experiment results proved that the detection and recognition accuracy and real-time performance by the new network are obviously improved compared with the previous YOLOV3 neural network model in the same scene.More objects can be detected after the improvement of network which cannot be detected with the YOLOv3 network before the improvement.The running time of objects detection and recognition is reduced to less than half of the original.This improved network has a certain reference value for practical engineering application.
基金the National Natural Science Foundation of China(No.62173228)。
文摘Accurate vehicle localization is a key technology for autonomous driving tasks in indoor parking lots,such as automated valet parking.Additionally,infrastructure-based cooperative driving systems have become a means to realizing intelligent driving.In this paper,we propose a novel and practical vehicle localization system using infrastructure-based RGB-D cameras for indoor parking lots.In the proposed system,we design a depth data preprocessing method with both simplicity and efficiency to reduce the computational burden resulting from a large amount of data.Meanwhile,the hardware synchronization for all cameras in the sensor network is not implemented owing to the disadvantage that it is extremely cumbersome and would significantly reduce the scalability of our system in mass deployments.Hence,to address the problem of data distortion accompanying vehicle motion,we propose a vehicle localization method by performing template point cloud registration in distributed depth data.Finally,a complete hardware system was built to verify the feasibility of our solution in a real-world environment.Experiments in an indoor parking lot demonstrated the effectiveness and accuracy of the proposed vehicle localization system,with a maximum root mean squared error of 5 cm at 15Hz compared with the ground truth.
基金financially supported by the National Key Research and Development Program(NKRDP)projects(Grant No.2023YFD2001100)Major Science and Technology Programs in Henan Province(Grant No.221100110800)Major Science and Technology Special Project of Henan Province(Longmen Laboratory First-class Project)(Grant No.231100220200).
文摘Recognition of the boundaries of farmland plow areas has an important guiding role in the operation of intelligent agricultural equipment.To precisely recognize these boundaries,a detection method for unmanned tractor plow areas based on RGB-Depth(RGB-D)cameras was proposed,and the feasibility of the detection method was analyzed.This method applied advanced computer vision technology to the field of agricultural automation.Adopting and improving the YOLOv5-seg object segmentation algorithm,first,the Convolutional Block Attention Module(CBAM)was integrated into Concentrated-Comprehensive Convolution Block(C3)to form C3CBAM,thereby enhancing the ability of the network to extract features from plow areas.The GhostConv module was also utilized to reduce parameter and computational complexity.Second,using the depth image information provided by the RGB-D camera combined with the results recognized by the YOLOv5-seg model,the mask image was processed to extract contour boundaries,align the contours with the depth map,and obtain the boundary distance information of the plowed area.Last,based on farmland information,the calculated average boundary distance was corrected,further improving the accuracy of the distance measurements.The experiment results showed that the YOLOv5-seg object segmentation algorithm achieved a recognition accuracy of 99%for plowed areas and that the ranging accuracy improved with decreasing detection distance.The ranging error at 5.5 m was approximately 0.056 m,and the average detection time per frame is 29 ms,which can meet the real-time operational requirements.The results of this study can provide precise guarantees for the autonomous operation of unmanned plowing units.
基金supported by the National Natural Science Foundation of China(Project No.61120106007)Research Grant of Beijing Higher Institution Engineering Research CenterTsinghua University Initiative Scientific Research Program
文摘3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras,there is a growing interest in digitizing real-world indoor 3D scenes. However,modeling indoor3 D scenes remains a challenging problem because of the complex structure of interior objects and poor quality of RGB-D data acquired by consumer-level sensors.Various methods have been proposed to tackle these challenges. In this survey,we provide an overview of recent advances in indoor scene modeling techniques,as well as public datasets and code libraries which can facilitate experiments and evaluation.