Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
The high bandwidth and low latency of 6G network technology enable the successful application of monocular 3D object detection on vehicle platforms.Monocular 3D-object-detection-based Pseudo-LiDAR is a low-cost,lowpow...The high bandwidth and low latency of 6G network technology enable the successful application of monocular 3D object detection on vehicle platforms.Monocular 3D-object-detection-based Pseudo-LiDAR is a low-cost,lowpower solution compared to LiDAR solutions in the field of autonomous driving.However,this technique has some problems,i.e.,(1)the poor quality of generated Pseudo-LiDAR point clouds resulting from the nonlinear error distribution of monocular depth estimation and(2)the weak representation capability of point cloud features due to the neglected global geometric structure features of point clouds existing in LiDAR-based 3D detection networks.Therefore,we proposed a Pseudo-LiDAR confidence sampling strategy and a hierarchical geometric feature extraction module for monocular 3D object detection.We first designed a point cloud confidence sampling strategy based on a 3D Gaussian distribution to assign small confidence to the points with great error in depth estimation and filter them out according to the confidence.Then,we present a hierarchical geometric feature extraction module by aggregating the local neighborhood features and a dual transformer to capture the global geometric features in the point cloud.Finally,our detection framework is based on Point-Voxel-RCNN(PV-RCNN)with high-quality Pseudo-LiDAR and enriched geometric features as input.From the experimental results,our method achieves satisfactory results in monocular 3D object detection.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented da...In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.展开更多
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported by the National Key Research and Development Program of China(2020YFB1807500)the National Natural Science Foundation of China(62072360,62001357,62172438,61901367)+4 种基金the key research and development plan of Shaanxi province(2021ZDLGY02-09,2023-GHZD-44,2023-ZDLGY-54)the Natural Science Foundation of Guangdong Province of China(2022A1515010988)Key Project on Artificial Intelligence of Xi'an Science and Technology Plan(2022JH-RGZN-0003,2022JH-RGZN-0103,2022JH-CLCJ-0053)Xi'an Science and Technology Plan(20RGZN0005)the Proof-ofconcept fund from Hangzhou Research Institute of Xidian University(GNYZ2023QC0201).
文摘The high bandwidth and low latency of 6G network technology enable the successful application of monocular 3D object detection on vehicle platforms.Monocular 3D-object-detection-based Pseudo-LiDAR is a low-cost,lowpower solution compared to LiDAR solutions in the field of autonomous driving.However,this technique has some problems,i.e.,(1)the poor quality of generated Pseudo-LiDAR point clouds resulting from the nonlinear error distribution of monocular depth estimation and(2)the weak representation capability of point cloud features due to the neglected global geometric structure features of point clouds existing in LiDAR-based 3D detection networks.Therefore,we proposed a Pseudo-LiDAR confidence sampling strategy and a hierarchical geometric feature extraction module for monocular 3D object detection.We first designed a point cloud confidence sampling strategy based on a 3D Gaussian distribution to assign small confidence to the points with great error in depth estimation and filter them out according to the confidence.Then,we present a hierarchical geometric feature extraction module by aggregating the local neighborhood features and a dual transformer to capture the global geometric features in the point cloud.Finally,our detection framework is based on Point-Voxel-RCNN(PV-RCNN)with high-quality Pseudo-LiDAR and enriched geometric features as input.From the experimental results,our method achieves satisfactory results in monocular 3D object detection.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.
基金partly supported by the National Natural Science Foundation of China(Nos.61532012,61370196,and 61672109)
文摘In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.