The positional information of objects is crucial to enable robots to perform grasping and pushing manipulations in clutter.To effectively perform grasping and pushing manipu-lations,robots need to perceive the positio...The positional information of objects is crucial to enable robots to perform grasping and pushing manipulations in clutter.To effectively perform grasping and pushing manipu-lations,robots need to perceive the position information of objects,including the co-ordinates and spatial relationship between objects(e.g.,proximity,adjacency).The authors propose an end-to-end position-aware deep Q-learning framework to achieve efficient collaborative pushing and grasping in clutter.Specifically,a pair of conjugate pushing and grasping attention modules are proposed to capture the position information of objects and generate high-quality affordance maps of operating positions with features of pushing and grasping operations.In addition,the authors propose an object isolation metric and clutter metric based on instance segmentation to measure the spatial re-lationships between objects in cluttered environments.To further enhance the perception capacity of position information of the objects,the authors associate the change in the object isolation metric and clutter metric in cluttered environment before and after performing the action with reward function.A series of experiments are carried out in simulation and real-world which indicate that the method improves sample efficiency,task completion rate,grasping success rate and action efficiency compared to state-of-the-art end-to-end methods.Noted that the authors’system can be robustly applied to real-world use and extended to novel objects.Supplementary material is available at https://youtu.be/NhG\_k5v3NnM}{https://youtu.be/NhG\_k5v3NnM.展开更多
Face detection has achieved tremendous strides thanks to convolutional neural networks. However, dense face detection remains an open challenge due to large face scale variation, tiny faces, and serious occlusion. Thi...Face detection has achieved tremendous strides thanks to convolutional neural networks. However, dense face detection remains an open challenge due to large face scale variation, tiny faces, and serious occlusion. This paper presents a robust, dense face detector using global context and visual attention mechanisms which can significantly improve detection accuracy. Specifically, a global context fusion module with top-down feedback is proposed to improve the ability to identify tiny faces. Moreover, a visual attention mechanism is employed to solve the problem of occlusion. Experimental results on the public face datasets WIDER FACE and FDDB demonstrate the effectiveness of the proposed method.展开更多
基金Beijing Municipal Natural Science Foundation,Grant/Award Number:4212933National Natural Science Foundation of China,Grant/Award Number:61873008National Key R&D Plan,Grant/Award Number:2018YFB1307004。
文摘The positional information of objects is crucial to enable robots to perform grasping and pushing manipulations in clutter.To effectively perform grasping and pushing manipu-lations,robots need to perceive the position information of objects,including the co-ordinates and spatial relationship between objects(e.g.,proximity,adjacency).The authors propose an end-to-end position-aware deep Q-learning framework to achieve efficient collaborative pushing and grasping in clutter.Specifically,a pair of conjugate pushing and grasping attention modules are proposed to capture the position information of objects and generate high-quality affordance maps of operating positions with features of pushing and grasping operations.In addition,the authors propose an object isolation metric and clutter metric based on instance segmentation to measure the spatial re-lationships between objects in cluttered environments.To further enhance the perception capacity of position information of the objects,the authors associate the change in the object isolation metric and clutter metric in cluttered environment before and after performing the action with reward function.A series of experiments are carried out in simulation and real-world which indicate that the method improves sample efficiency,task completion rate,grasping success rate and action efficiency compared to state-of-the-art end-to-end methods.Noted that the authors’system can be robustly applied to real-world use and extended to novel objects.Supplementary material is available at https://youtu.be/NhG\_k5v3NnM}{https://youtu.be/NhG\_k5v3NnM.
基金supported by National Natural Science Foundation of China(No.61973009).
文摘Face detection has achieved tremendous strides thanks to convolutional neural networks. However, dense face detection remains an open challenge due to large face scale variation, tiny faces, and serious occlusion. This paper presents a robust, dense face detector using global context and visual attention mechanisms which can significantly improve detection accuracy. Specifically, a global context fusion module with top-down feedback is proposed to improve the ability to identify tiny faces. Moreover, a visual attention mechanism is employed to solve the problem of occlusion. Experimental results on the public face datasets WIDER FACE and FDDB demonstrate the effectiveness of the proposed method.