Robotic grasps play an important role in the service and industrial fields,and the robotic arm can grasp the object properly depends on the accuracy of the grasping detection result.In order to predict grasping detect...Robotic grasps play an important role in the service and industrial fields,and the robotic arm can grasp the object properly depends on the accuracy of the grasping detection result.In order to predict grasping detection positions for known or unknown objects by a modular robotic system,a convolutional neural network(CNN)with the residual block is proposed,which can be used to generate accurate grasping detection for input images of the scene.The proposed model architecture was trained on the standard Cornell grasp dataset and evaluated on the test dataset.Moreover,it was evaluated on different types of household objects and cluttered multi-objects.On the Cornell grasp dataset,the accuracy of the model on image-wise splitting detection and object-wise splitting detection achieved 95.5%and 93.6%,respectively.Further,the real detection time per image was 109 ms.The experimental results show that the model can quickly detect the grasping positions of a single object or multiple objects in image pixels in real time,and it keeps good stability and robustness.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
To balance the inference speed and detection accuracy of a grasp detection algorithm,which are both important for robot grasping tasks,we propose an encoder–decoder structured pixel-level grasp detection neural netwo...To balance the inference speed and detection accuracy of a grasp detection algorithm,which are both important for robot grasping tasks,we propose an encoder–decoder structured pixel-level grasp detection neural network named the attention-based efficient robot grasp detection network(AE-GDN).Three spatial attention modules are introduced in the encoder stages to enhance the detailed information,and three channel attention modules are introduced in the decoder stages to extract more semantic information.Several lightweight and efficient DenseBlocks are used to connect the encoder and decoder paths to improve the feature modeling capability of AE-GDN.A high intersection over union(IoU)value between the predicted grasp rectangle and the ground truth does not necessarily mean a high-quality grasp configuration,but might cause a collision.This is because traditional IoU loss calculation methods treat the center part of the predicted rectangle as having the same importance as the area around the grippers.We design a new IoU loss calculation method based on an hourglass box matching mechanism,which will create good correspondence between high IoUs and high-quality grasp configurations.AEGDN achieves the accuracy of 98.9%and 96.6%on the Cornell and Jacquard datasets,respectively.The inference speed reaches 43.5 frames per second with only about 1.2×10^(6)parameters.The proposed AE-GDN has also been deployed on a practical robotic arm grasping system and performs grasping well.Codes are available at https://github.com/robvincen/robot_gradet.展开更多
Robotic grasping is an essential problem at both the household and industrial levels,and unstructured objects have always been difficult for grippers.Parallel-plate grippers and algorithms,focusing on partial informat...Robotic grasping is an essential problem at both the household and industrial levels,and unstructured objects have always been difficult for grippers.Parallel-plate grippers and algorithms,focusing on partial information of objects,are one of the widely used approaches.However,most works predict single-size grasp rectangles for fixed cameras and gripper sizes.In this paper,a multi-scale grasp detector is proposed to predict grasp rectangles with different sizes on RGB-D or RGB images in real-time for hand-eye cameras and various parallel-plate grippers.The detector extracts feature maps of multiple scales and conducts predictions on each scale independently.To guarantee independence between scales and efficiency,fully matching model and background classifier are applied in the network.Based on analysis of the Cornell Grasp Dataset,the fully matching model canmatch all labeled grasp rectangles.Furthermore,background classification,along with angle classification and box regression,functions as hard negative mining and background predictor.The detector is trained and tested on the augmented dataset,which includes images of 320×320 pixels and grasp rectangles ranging from 20 tomore than 320 pixels.It performs up to 98.87% accuracy on image-wise dataset and 97.83% on object-wise split dataset at a speed of more than 22 frames per second.In addition,the detector,which is trained on a single-object dataset,can predict grasps on multiple objects.展开更多
Grasp detection is a visual recognition task where the robot makes use of its sensors to detect graspable objects in its environment.Despite the steady progress in robotic grasping,it is still difficult to achieve bot...Grasp detection is a visual recognition task where the robot makes use of its sensors to detect graspable objects in its environment.Despite the steady progress in robotic grasping,it is still difficult to achieve both real-time and high accuracy grasping detection.In this paper,we propose a real-time robotic grasp detection method,which can accurately predict potential grasp for parallel-plate robotic grippers using RGB images.Our work employs an end-to-end convolutional neural network which consists of a feature descriptor and a grasp detector.And for the first time,we add an attention mechanism to the grasp detection task,which enables the network to focus on grasp regions rather than background.Specifically,we present an angular label smoothing strategy in our grasp detection method to enhance the fault tolerance of the network.We quantitatively and qualitatively evaluate our grasp detection method from different aspects on the public Cornell dataset and Jacquard dataset.Extensive experiments demonstrate that our grasp detection method achieves superior performance to the state-of-the-art methods.In particular,our grasp detection method ranked first on both the Cornell dataset and the Jacquard dataset,giving rise to the accuracy of 98.9%and 95.6%,respectively at realtime calculation speed.展开更多
基金National Natural Science Foundation of China(No.52101346)Fundamental Research Funds for the Central Universities,China(No.2232019D3-61)Initial Research Fund for the Young Teachers of Donghua University,China。
文摘Robotic grasps play an important role in the service and industrial fields,and the robotic arm can grasp the object properly depends on the accuracy of the grasping detection result.In order to predict grasping detection positions for known or unknown objects by a modular robotic system,a convolutional neural network(CNN)with the residual block is proposed,which can be used to generate accurate grasping detection for input images of the scene.The proposed model architecture was trained on the standard Cornell grasp dataset and evaluated on the test dataset.Moreover,it was evaluated on different types of household objects and cluttered multi-objects.On the Cornell grasp dataset,the accuracy of the model on image-wise splitting detection and object-wise splitting detection achieved 95.5%and 93.6%,respectively.Further,the real detection time per image was 109 ms.The experimental results show that the model can quickly detect the grasping positions of a single object or multiple objects in image pixels in real time,and it keeps good stability and robustness.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.
基金supported by the National Natural Science Foundation of China(No.92048205)the China Scholarship Council(No.202008310014)。
文摘To balance the inference speed and detection accuracy of a grasp detection algorithm,which are both important for robot grasping tasks,we propose an encoder–decoder structured pixel-level grasp detection neural network named the attention-based efficient robot grasp detection network(AE-GDN).Three spatial attention modules are introduced in the encoder stages to enhance the detailed information,and three channel attention modules are introduced in the decoder stages to extract more semantic information.Several lightweight and efficient DenseBlocks are used to connect the encoder and decoder paths to improve the feature modeling capability of AE-GDN.A high intersection over union(IoU)value between the predicted grasp rectangle and the ground truth does not necessarily mean a high-quality grasp configuration,but might cause a collision.This is because traditional IoU loss calculation methods treat the center part of the predicted rectangle as having the same importance as the area around the grippers.We design a new IoU loss calculation method based on an hourglass box matching mechanism,which will create good correspondence between high IoUs and high-quality grasp configurations.AEGDN achieves the accuracy of 98.9%and 96.6%on the Cornell and Jacquard datasets,respectively.The inference speed reaches 43.5 frames per second with only about 1.2×10^(6)parameters.The proposed AE-GDN has also been deployed on a practical robotic arm grasping system and performs grasping well.Codes are available at https://github.com/robvincen/robot_gradet.
基金fundings from Central Program of Basic Science of the National Natural Science Foundation of China(72088101)the National Postdoctoral Program for Innovative Talents(BX2021285).
文摘Robotic grasping is an essential problem at both the household and industrial levels,and unstructured objects have always been difficult for grippers.Parallel-plate grippers and algorithms,focusing on partial information of objects,are one of the widely used approaches.However,most works predict single-size grasp rectangles for fixed cameras and gripper sizes.In this paper,a multi-scale grasp detector is proposed to predict grasp rectangles with different sizes on RGB-D or RGB images in real-time for hand-eye cameras and various parallel-plate grippers.The detector extracts feature maps of multiple scales and conducts predictions on each scale independently.To guarantee independence between scales and efficiency,fully matching model and background classifier are applied in the network.Based on analysis of the Cornell Grasp Dataset,the fully matching model canmatch all labeled grasp rectangles.Furthermore,background classification,along with angle classification and box regression,functions as hard negative mining and background predictor.The detector is trained and tested on the augmented dataset,which includes images of 320×320 pixels and grasp rectangles ranging from 20 tomore than 320 pixels.It performs up to 98.87% accuracy on image-wise dataset and 97.83% on object-wise split dataset at a speed of more than 22 frames per second.In addition,the detector,which is trained on a single-object dataset,can predict grasps on multiple objects.
基金supported by the National Key Research and Development Program of China under Grant No.2018AAA010-3002the National Natural Science Foundation of China under Grant Nos.62172392,61702482 and 61972379.
文摘Grasp detection is a visual recognition task where the robot makes use of its sensors to detect graspable objects in its environment.Despite the steady progress in robotic grasping,it is still difficult to achieve both real-time and high accuracy grasping detection.In this paper,we propose a real-time robotic grasp detection method,which can accurately predict potential grasp for parallel-plate robotic grippers using RGB images.Our work employs an end-to-end convolutional neural network which consists of a feature descriptor and a grasp detector.And for the first time,we add an attention mechanism to the grasp detection task,which enables the network to focus on grasp regions rather than background.Specifically,we present an angular label smoothing strategy in our grasp detection method to enhance the fault tolerance of the network.We quantitatively and qualitatively evaluate our grasp detection method from different aspects on the public Cornell dataset and Jacquard dataset.Extensive experiments demonstrate that our grasp detection method achieves superior performance to the state-of-the-art methods.In particular,our grasp detection method ranked first on both the Cornell dataset and the Jacquard dataset,giving rise to the accuracy of 98.9%and 95.6%,respectively at realtime calculation speed.