Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta...The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.展开更多
The perception module of advanced driver assistance systems plays a vital role.Perception schemes often use a single sensor for data processing and environmental perception or adopt the information processing results ...The perception module of advanced driver assistance systems plays a vital role.Perception schemes often use a single sensor for data processing and environmental perception or adopt the information processing results of various sensors for the fusion of the detection layer.This paper proposes a multi-scale and multi-sensor data fusion strategy in the front end of perception and accomplishes a multi-sensor function disparity map generation scheme.A binocular stereo vision sensor composed of two cameras and a light deterction and ranging(LiDAR)sensor is used to jointly perceive the environment,and a multi-scale fusion scheme is employed to improve the accuracy of the disparity map.This solution not only has the advantages of dense perception of binocular stereo vision sensors but also considers the perception accuracy of LiDAR sensors.Experiments demonstrate that the multi-scale multi-sensor scheme proposed in this paper significantly improves disparity map estimation.展开更多
In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) ba...In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) based on the maximum margin criterion(MMC) is proposed for recognizing the class of ship targets utilizing the high-resolution range profile(HRRP). Multi-scale fusion is introduced to capture the local and detailed information in small-scale features, and the global and contour information in large-scale features, offering help to extract the edge information from sea clutter and further improving the target recognition accuracy. The proposed method can maximally preserve the multi-scale fusion sparse of data and maximize the class separability in the reduced dimensionality by reproducing kernel Hilbert space. Experimental results on the measured radar data show that the proposed method can effectively extract the features of ship target from sea clutter, further reduce the feature dimensionality, and improve target recognition performance.展开更多
A system for classifying four basic table tennis strokes using wearable devices and deep learning networks is proposed in this study.The wearable device consisted of a six-axis sensor,Raspberry Pi 3,and a power bank.M...A system for classifying four basic table tennis strokes using wearable devices and deep learning networks is proposed in this study.The wearable device consisted of a six-axis sensor,Raspberry Pi 3,and a power bank.Multiple kernel sizes were used in convolutional neural network(CNN)to evaluate their performance for extracting features.Moreover,a multiscale CNN with two kernel sizes was used to perform feature fusion at different scales in a concatenated manner.The CNN achieved recognition of the four table tennis strokes.Experimental data were obtained from20 research participants who wore sensors on the back of their hands while performing the four table tennis strokes in a laboratory environment.The data were collected to verify the performance of the proposed models for wearable devices.Finally,the sensor and multi-scale CNN designed in this study achieved accuracy and F1 scores of 99.58%and 99.16%,respectively,for the four strokes.The accuracy for five-fold cross validation was 99.87%.This result also shows that the multi-scale convolutional neural network has better robustness after fivefold cross validation.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
Although the Faster Region-based Convolutional Neural Network(Faster R-CNN)model has obvious advantages in defect recognition,it still cannot overcome challenging problems,such as time-consuming,small targets,irregula...Although the Faster Region-based Convolutional Neural Network(Faster R-CNN)model has obvious advantages in defect recognition,it still cannot overcome challenging problems,such as time-consuming,small targets,irregular shapes,and strong noise interference in bridge defect detection.To deal with these issues,this paper proposes a novel Multi-scale Feature Fusion(MFF)model for bridge appearance disease detection.First,the Faster R-CNN model adopts Region Of Interest(ROl)pooling,which omits the edge information of the target area,resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects.Therefore,this paper proposes an MFF based on regional feature Aggregation(MFF-A),which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area.Second,the Faster R-CNN model is insensitive to small targets,irregular shapes,and strong noises in bridge defect detection,which results in a long training time and low recognition accuracy.Accordingly,a novel Lightweight MFF(namely MFF-L)model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed,which fuses multi-scale features to shorten the training speed and improve recognition accuracy.Finally,the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.展开更多
In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid...In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.展开更多
Background:Accurate mapping of tree species is highly desired in the management and research of plantation forests,whose ecosystem services are currently under threats.Time-series multispectral satellite images,e.g.,f...Background:Accurate mapping of tree species is highly desired in the management and research of plantation forests,whose ecosystem services are currently under threats.Time-series multispectral satellite images,e.g.,from Landsat-8(L8)and Sentinel-2(S2),have been proven useful in mapping general forest types,yet we do not know quantitatively how their spectral features(e.g.,red-edge)and temporal frequency of data acquisitions(e.g.,16-day vs.5-day)contribute to plantation forest mapping to the species level.Moreover,it is unclear to what extent the fusion of L8 and S2 will result in improvements in tree species mapping of northern plantation forests in China.Methods:We designed three sets of classification experiments(i.e.,single-date,multi-date,and spectral-temporal)to evaluate the performances of L8 and S2 data for mapping keystone timber tree species in northern China.We first used seven pairs of L8 and S2 images to evaluate the performances of L8 and S2 key spectral features for separating these tree species across key growing stages.Then we extracted the spectral-temporal features from all available images of different temporal frequency of data acquisition(i.e.,L8 time series,S2 time series,and fusion of L8 and S2)to assess the contribution of image temporal frequency on the accuracy of tree species mapping in the study area.Results:1)S2 outperformed L8 images in all classification experiments,with or without the red edge bands(0.4%–3.4%and 0.2%–4.4%higher for overall accuracy and macro-F1,respectively);2)NDTI(the ratio of SWIR1 minus SWIR2 to SWIR1 plus SWIR2)and Tasseled Cap coefficients were most important features in all the classifications,and for time-series experiments,the spectral-temporal features of red band-related vegetation indices were most useful;3)increasing the temporal frequency of data acquisition can improve overall accuracy of tree species mapping for up to 3.2%(from 90.1%using single-date imagery to 93.3%using S2 time-series),yet similar overall accuracies were achieved using S2 time-series(93.3%)and the fusion of S2 and L8(93.2%).Conclusions:This study quantifies the contributions of L8 and S2 spectral and temporal features in mapping keystone tree species of northern plantation forests in China and suggests that for mapping tree species in China's northern plantation forests,the effects of increasing the temporal frequency of data acquisition could saturate quickly after using only two images from key phenological stages.展开更多
With the remarkable advancements in machine vision research and its ever-expanding applications,scholars have increasingly focused on harnessing various vision methodologies within the industrial realm.Specifically,de...With the remarkable advancements in machine vision research and its ever-expanding applications,scholars have increasingly focused on harnessing various vision methodologies within the industrial realm.Specifically,detecting vehicle floor welding points poses unique challenges,including high operational costs and limited portability in practical settings.To address these challenges,this paper innovatively integrates template matching and the Faster RCNN algorithm,presenting an industrial fusion cascaded solder joint detection algorithm that seamlessly blends template matching with deep learning techniques.This algorithm meticulously weights and fuses the optimized features of both methodologies,enhancing the overall detection capabilities.Furthermore,it introduces an optimized multi-scale and multi-template matching approach,leveraging a diverse array of templates and image pyramid algorithms to bolster the accuracy and resilience of object detection.By integrating deep learning algorithms with this multi-scale and multi-template matching strategy,the cascaded target matching algorithm effectively accurately identifies solder joint types and positions.A comprehensive welding point dataset,labeled by experts specifically for vehicle detection,was constructed based on images from authentic industrial environments to validate the algorithm’s performance.Experiments demonstrate the algorithm’s compelling performance in industrial scenarios,outperforming the single-template matching algorithm by 21.3%,the multi-scale and multitemplate matching algorithm by 3.4%,the Faster RCNN algorithm by 19.7%,and the YOLOv9 algorithm by 17.3%in terms of solder joint detection accuracy.This optimized algorithm exhibits remarkable robustness and portability,ideally suited for detecting solder joints across diverse vehicle workpieces.Notably,this study’s dataset and feature fusion approach can be a valuable resource for other algorithms seeking to enhance their solder joint detection capabilities.This work thus not only presents a novel and effective solution for industrial solder joint detection but lays the groundwork for future advancements in this critical area.展开更多
To overcome the problem that a single feature can not reflect the state of machinery in different stages,a method of vibration feature fusion based on self-organizing map(SOM) is presented.Minimum quantization error(M...To overcome the problem that a single feature can not reflect the state of machinery in different stages,a method of vibration feature fusion based on self-organizing map(SOM) is presented.Minimum quantization error(MQE) is obtained unsupervised based on SOM network.And trend information of the MQE curve is extracted by the wavelet packet to enhance state differentiating.Experimental flat is designed for bearing accelerating fatigue.And experimental results show that the method of vibration feature fusion based on SOM can reflect the state of machinery in different stages effectively.展开更多
Vehicle Color Recognition(VCR)plays a vital role in intelligent traffic management and criminal investigation assistance.However,the existing vehicle color datasets only cover 13 classes,which can not meet the current...Vehicle Color Recognition(VCR)plays a vital role in intelligent traffic management and criminal investigation assistance.However,the existing vehicle color datasets only cover 13 classes,which can not meet the current actual demand.Besides,although lots of efforts are devoted to VCR,they suffer from the problem of class imbalance in datasets.To address these challenges,in this paper,we propose a novel VCR method based on Smooth Modulation Neural Network with Multi-Scale Feature Fusion(SMNN-MSFF).Specifically,to construct the benchmark of model training and evaluation,we first present a new VCR dataset with 24 vehicle classes,Vehicle Color-24,consisting of 10091 vehicle images from a 100-hour urban road surveillance video.Then,to tackle the problem of long-tail distribution and improve the recognition performance,we propose the SMNN-MSFF model with multiscale feature fusion and smooth modulation.The former aims to extract feature information from local to global,and the latter could increase the loss of the images of tail class instances for training with class-imbalance.Finally,comprehensive experimental evaluation on Vehicle Color-24 and previously three representative datasets demonstrate that our proposed SMNN-MSFF outperformed state-of-the-art VCR methods.And extensive ablation studies also demonstrate that each module of our method is effective,especially,the smooth modulation efficiently help feature learning of the minority or tail classes.Vehicle Color-24 and the code of SMNN-MSFF are publicly available and can contact the author to obtain.展开更多
In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swa...In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swarm unmanned aerial vehicles(UAVs).First,the bidirectional parallel multi-branch convolution modules are used to construct the feature pyramid to enhance the feature expression abilities of different scale feature layers.Next,the feature pyramid is integrated into the single-stage object detection framework to ensure real-time performance.In order to validate the effectiveness of the proposed algorithm,experiments are conducted on four datasets.For the PASCAL VOC dataset,the proposed algorithm achieves the mean average precision(mAP)of 85.4 on the VOC 2007 test set.With regard to the detection in optical remote sensing(DIOR)dataset,the proposed algorithm achieves 73.9 mAP.For vehicle detection in aerial imagery(VEDAI)dataset,the detection accuracy of small land vehicle(slv)targets reaches 97.4 mAP.For unmanned aerial vehicle detection and tracking(UAVDT)dataset,the proposed BPMFPN Det achieves the mAP of 48.75.Compared with the previous state-of-the-art methods,the results obtained by the proposed algorithm are more competitive.The experimental results demonstrate that the proposed algorithm can effectively solve the problem of real-time detection of ground multi-scale targets in aerial images of swarm UAVs.展开更多
Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe...Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network(CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion(MMFF) is proposed. Specifically, first residual network(Resnet)-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network(FPN), finally squeeze-and-excitation fusion(SEF) module and self-attention network(SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.展开更多
基于视频的目标检测在恶劣天气情况下识别效果较差,故需弥补视频缺陷、提高检测框架的鲁棒性。针对此问题,文中设计了一个基于雷达和视频融合的目标检测框架,利用YOLOv5(You Only Look Once version 5)网络获得图片特征图与图片检测框,...基于视频的目标检测在恶劣天气情况下识别效果较差,故需弥补视频缺陷、提高检测框架的鲁棒性。针对此问题,文中设计了一个基于雷达和视频融合的目标检测框架,利用YOLOv5(You Only Look Once version 5)网络获得图片特征图与图片检测框,利用基于密度的聚类获得雷达检测框,并将雷达数据进行编码,得到基于雷达信息的目标检测结果。最后将两者的检测框叠加得到新ROI(Region of Interest),并得到融合雷达信息后的分类向量,提高了在极端天气下检测的准确率。实验结果表明,该框架的mAP(mean Average Precision)达到了60.07%,且参数量仅为7.64×10^(6),表明该框架具有轻量级、计算速度快、鲁棒性高等特点,可以被广泛应用于嵌入式与移动端平台。展开更多
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金This work was supported in part by the National Natural Science Foundation of China(Grant#:82260362)in part by the National Key R&D Program of China(Grant#:2021ZD0111000)+1 种基金in part by the Key R&D Project of Hainan Province(Grant#:ZDYF2021SHFZ243)in part by the Major Science and Technology Project of Haikou(Grant#:2020-009).
文摘The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.
基金the National Key R&D Program of China(2018AAA0103103).
文摘The perception module of advanced driver assistance systems plays a vital role.Perception schemes often use a single sensor for data processing and environmental perception or adopt the information processing results of various sensors for the fusion of the detection layer.This paper proposes a multi-scale and multi-sensor data fusion strategy in the front end of perception and accomplishes a multi-sensor function disparity map generation scheme.A binocular stereo vision sensor composed of two cameras and a light deterction and ranging(LiDAR)sensor is used to jointly perceive the environment,and a multi-scale fusion scheme is employed to improve the accuracy of the disparity map.This solution not only has the advantages of dense perception of binocular stereo vision sensors but also considers the perception accuracy of LiDAR sensors.Experiments demonstrate that the multi-scale multi-sensor scheme proposed in this paper significantly improves disparity map estimation.
基金supported by the National Natural Science Foundation of China (62271255,61871218)the Fundamental Research Funds for the Central University (3082019NC2019002)+1 种基金the Aeronautical Science Foundation (ASFC-201920007002)the Program of Remote Sensing Intelligent Monitoring and Emergency Services for Regional Security Elements。
文摘In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) based on the maximum margin criterion(MMC) is proposed for recognizing the class of ship targets utilizing the high-resolution range profile(HRRP). Multi-scale fusion is introduced to capture the local and detailed information in small-scale features, and the global and contour information in large-scale features, offering help to extract the edge information from sea clutter and further improving the target recognition accuracy. The proposed method can maximally preserve the multi-scale fusion sparse of data and maximize the class separability in the reduced dimensionality by reproducing kernel Hilbert space. Experimental results on the measured radar data show that the proposed method can effectively extract the features of ship target from sea clutter, further reduce the feature dimensionality, and improve target recognition performance.
基金supporting of the Ministry of Science and Technology MOST(Grant No.MOST 108–2221-E-150–022-MY3,MOST 110–2634-F-019–002)the National Taiwan Ocean University,China.
文摘A system for classifying four basic table tennis strokes using wearable devices and deep learning networks is proposed in this study.The wearable device consisted of a six-axis sensor,Raspberry Pi 3,and a power bank.Multiple kernel sizes were used in convolutional neural network(CNN)to evaluate their performance for extracting features.Moreover,a multiscale CNN with two kernel sizes was used to perform feature fusion at different scales in a concatenated manner.The CNN achieved recognition of the four table tennis strokes.Experimental data were obtained from20 research participants who wore sensors on the back of their hands while performing the four table tennis strokes in a laboratory environment.The data were collected to verify the performance of the proposed models for wearable devices.Finally,the sensor and multi-scale CNN designed in this study achieved accuracy and F1 scores of 99.58%and 99.16%,respectively,for the four strokes.The accuracy for five-fold cross validation was 99.87%.This result also shows that the multi-scale convolutional neural network has better robustness after fivefold cross validation.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.
基金This work was supported by the National Natural Science Foundation of China(No.61976247)the Major R&D Programs of China(No.2019YFB-1310400).
文摘Although the Faster Region-based Convolutional Neural Network(Faster R-CNN)model has obvious advantages in defect recognition,it still cannot overcome challenging problems,such as time-consuming,small targets,irregular shapes,and strong noise interference in bridge defect detection.To deal with these issues,this paper proposes a novel Multi-scale Feature Fusion(MFF)model for bridge appearance disease detection.First,the Faster R-CNN model adopts Region Of Interest(ROl)pooling,which omits the edge information of the target area,resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects.Therefore,this paper proposes an MFF based on regional feature Aggregation(MFF-A),which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area.Second,the Faster R-CNN model is insensitive to small targets,irregular shapes,and strong noises in bridge defect detection,which results in a long training time and low recognition accuracy.Accordingly,a novel Lightweight MFF(namely MFF-L)model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed,which fuses multi-scale features to shorten the training speed and improve recognition accuracy.Finally,the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.
基金The National Natural Science Foundation of China(No.61603091)。
文摘In order to improve the detection accuracy of small objects,a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed.Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD),where the bottom layer of the feature pyramid network relies on the top layer,NFPN builds the feature pyramid network with no connections between the upper and lower layers.That is,it only fuses shallow features on similar scales.NFPN is highly portable and can be embedded in many models to further boost performance.Extensive experiments on PASCAL VOC 2007,2012,and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed,especially for small objects,e.g.,4%to 5%higher mAP(mean average precision)than SSD,and 2%to 3%higher mAP than DSSD.On VOC 2007 test set,the NFPN-based SSD with 300×300 input reaches 79.4%mAP at 34.6 frame/s,and the mAP can raise to 82.9%after using the multi-scale testing strategy.
基金supported by National Natural Science Foundation of China(Grant No.41901382)Open Fund of State Key Laboratory of Remote Sensing Science(Grant No.OFSLRSS201917)the HZAU research startup fund(No.11041810340,No.11041810341).
文摘Background:Accurate mapping of tree species is highly desired in the management and research of plantation forests,whose ecosystem services are currently under threats.Time-series multispectral satellite images,e.g.,from Landsat-8(L8)and Sentinel-2(S2),have been proven useful in mapping general forest types,yet we do not know quantitatively how their spectral features(e.g.,red-edge)and temporal frequency of data acquisitions(e.g.,16-day vs.5-day)contribute to plantation forest mapping to the species level.Moreover,it is unclear to what extent the fusion of L8 and S2 will result in improvements in tree species mapping of northern plantation forests in China.Methods:We designed three sets of classification experiments(i.e.,single-date,multi-date,and spectral-temporal)to evaluate the performances of L8 and S2 data for mapping keystone timber tree species in northern China.We first used seven pairs of L8 and S2 images to evaluate the performances of L8 and S2 key spectral features for separating these tree species across key growing stages.Then we extracted the spectral-temporal features from all available images of different temporal frequency of data acquisition(i.e.,L8 time series,S2 time series,and fusion of L8 and S2)to assess the contribution of image temporal frequency on the accuracy of tree species mapping in the study area.Results:1)S2 outperformed L8 images in all classification experiments,with or without the red edge bands(0.4%–3.4%and 0.2%–4.4%higher for overall accuracy and macro-F1,respectively);2)NDTI(the ratio of SWIR1 minus SWIR2 to SWIR1 plus SWIR2)and Tasseled Cap coefficients were most important features in all the classifications,and for time-series experiments,the spectral-temporal features of red band-related vegetation indices were most useful;3)increasing the temporal frequency of data acquisition can improve overall accuracy of tree species mapping for up to 3.2%(from 90.1%using single-date imagery to 93.3%using S2 time-series),yet similar overall accuracies were achieved using S2 time-series(93.3%)and the fusion of S2 and L8(93.2%).Conclusions:This study quantifies the contributions of L8 and S2 spectral and temporal features in mapping keystone tree species of northern plantation forests in China and suggests that for mapping tree species in China's northern plantation forests,the effects of increasing the temporal frequency of data acquisition could saturate quickly after using only two images from key phenological stages.
基金supported in part by the National Key Research Project of China under Grant No.2023YFA1009402General Science and Technology Plan Items in Zhejiang Province ZJKJT-2023-02.
文摘With the remarkable advancements in machine vision research and its ever-expanding applications,scholars have increasingly focused on harnessing various vision methodologies within the industrial realm.Specifically,detecting vehicle floor welding points poses unique challenges,including high operational costs and limited portability in practical settings.To address these challenges,this paper innovatively integrates template matching and the Faster RCNN algorithm,presenting an industrial fusion cascaded solder joint detection algorithm that seamlessly blends template matching with deep learning techniques.This algorithm meticulously weights and fuses the optimized features of both methodologies,enhancing the overall detection capabilities.Furthermore,it introduces an optimized multi-scale and multi-template matching approach,leveraging a diverse array of templates and image pyramid algorithms to bolster the accuracy and resilience of object detection.By integrating deep learning algorithms with this multi-scale and multi-template matching strategy,the cascaded target matching algorithm effectively accurately identifies solder joint types and positions.A comprehensive welding point dataset,labeled by experts specifically for vehicle detection,was constructed based on images from authentic industrial environments to validate the algorithm’s performance.Experiments demonstrate the algorithm’s compelling performance in industrial scenarios,outperforming the single-template matching algorithm by 21.3%,the multi-scale and multitemplate matching algorithm by 3.4%,the Faster RCNN algorithm by 19.7%,and the YOLOv9 algorithm by 17.3%in terms of solder joint detection accuracy.This optimized algorithm exhibits remarkable robustness and portability,ideally suited for detecting solder joints across diverse vehicle workpieces.Notably,this study’s dataset and feature fusion approach can be a valuable resource for other algorithms seeking to enhance their solder joint detection capabilities.This work thus not only presents a novel and effective solution for industrial solder joint detection but lays the groundwork for future advancements in this critical area.
文摘To overcome the problem that a single feature can not reflect the state of machinery in different stages,a method of vibration feature fusion based on self-organizing map(SOM) is presented.Minimum quantization error(MQE) is obtained unsupervised based on SOM network.And trend information of the MQE curve is extracted by the wavelet packet to enhance state differentiating.Experimental flat is designed for bearing accelerating fatigue.And experimental results show that the method of vibration feature fusion based on SOM can reflect the state of machinery in different stages effectively.
基金This work was supported by the National Natural Science Foundation of China(Grant No.62071378)the Shaanxi Province International Science and Technology Cooperation Program(2022KW-04)the Xi’an Science and Technology Plan Project(21XJZZ0072).
文摘Vehicle Color Recognition(VCR)plays a vital role in intelligent traffic management and criminal investigation assistance.However,the existing vehicle color datasets only cover 13 classes,which can not meet the current actual demand.Besides,although lots of efforts are devoted to VCR,they suffer from the problem of class imbalance in datasets.To address these challenges,in this paper,we propose a novel VCR method based on Smooth Modulation Neural Network with Multi-Scale Feature Fusion(SMNN-MSFF).Specifically,to construct the benchmark of model training and evaluation,we first present a new VCR dataset with 24 vehicle classes,Vehicle Color-24,consisting of 10091 vehicle images from a 100-hour urban road surveillance video.Then,to tackle the problem of long-tail distribution and improve the recognition performance,we propose the SMNN-MSFF model with multiscale feature fusion and smooth modulation.The former aims to extract feature information from local to global,and the latter could increase the loss of the images of tail class instances for training with class-imbalance.Finally,comprehensive experimental evaluation on Vehicle Color-24 and previously three representative datasets demonstrate that our proposed SMNN-MSFF outperformed state-of-the-art VCR methods.And extensive ablation studies also demonstrate that each module of our method is effective,especially,the smooth modulation efficiently help feature learning of the minority or tail classes.Vehicle Color-24 and the code of SMNN-MSFF are publicly available and can contact the author to obtain.
文摘In this paper,based on a bidirectional parallel multi-branch feature pyramid network(BPMFPN),a novel one-stage object detector called BPMFPN Det is proposed for real-time detection of ground multi-scale targets by swarm unmanned aerial vehicles(UAVs).First,the bidirectional parallel multi-branch convolution modules are used to construct the feature pyramid to enhance the feature expression abilities of different scale feature layers.Next,the feature pyramid is integrated into the single-stage object detection framework to ensure real-time performance.In order to validate the effectiveness of the proposed algorithm,experiments are conducted on four datasets.For the PASCAL VOC dataset,the proposed algorithm achieves the mean average precision(mAP)of 85.4 on the VOC 2007 test set.With regard to the detection in optical remote sensing(DIOR)dataset,the proposed algorithm achieves 73.9 mAP.For vehicle detection in aerial imagery(VEDAI)dataset,the detection accuracy of small land vehicle(slv)targets reaches 97.4 mAP.For unmanned aerial vehicle detection and tracking(UAVDT)dataset,the proposed BPMFPN Det achieves the mAP of 48.75.Compared with the previous state-of-the-art methods,the results obtained by the proposed algorithm are more competitive.The experimental results demonstrate that the proposed algorithm can effectively solve the problem of real-time detection of ground multi-scale targets in aerial images of swarm UAVs.
基金supported by the National Natural Science Foundation of China(61962010,62262005)the Natural Science Foundation of Guizhou Priovince(QianKeHeJichu[2019]1425).
文摘Face anti-spoofing is used to assist face recognition system to judge whether the detected face is real face or fake face. In the traditional face anti-spoofing methods, features extracted by hand are used to describe the difference between living face and fraudulent face. But these handmade features do not apply to different variations in an unconstrained environment. The convolutional neural network(CNN) for face deceptions achieves considerable results. However, most existing neural network-based methods simply use neural networks to extract single-scale features from single-modal data, while ignoring multi-scale and multi-modal information. To address this problem, a novel face anti-spoofing method based on multi-modal and multi-scale features fusion(MMFF) is proposed. Specifically, first residual network(Resnet)-34 is adopted to extract features of different scales from each modality, then these features of different scales are fused by feature pyramid network(FPN), finally squeeze-and-excitation fusion(SEF) module and self-attention network(SAN) are combined to fuse features from different modalities for classification. Experiments on the CASIA-SURF dataset show that the new method based on MMFF achieves better performance compared with most existing methods.
文摘基于视频的目标检测在恶劣天气情况下识别效果较差,故需弥补视频缺陷、提高检测框架的鲁棒性。针对此问题,文中设计了一个基于雷达和视频融合的目标检测框架,利用YOLOv5(You Only Look Once version 5)网络获得图片特征图与图片检测框,利用基于密度的聚类获得雷达检测框,并将雷达数据进行编码,得到基于雷达信息的目标检测结果。最后将两者的检测框叠加得到新ROI(Region of Interest),并得到融合雷达信息后的分类向量,提高了在极端天气下检测的准确率。实验结果表明,该框架的mAP(mean Average Precision)达到了60.07%,且参数量仅为7.64×10^(6),表明该框架具有轻量级、计算速度快、鲁棒性高等特点,可以被广泛应用于嵌入式与移动端平台。