The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine ...The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.展开更多
Medical image classification has played an important role in the medical field, and the related method based on deep learning has become an important and powerful technique in medical image classification. In this art...Medical image classification has played an important role in the medical field, and the related method based on deep learning has become an important and powerful technique in medical image classification. In this article, we propose a simplified inception module based Hadamard attention (SI + HA) mechanism for medical image classification. Specifically, we propose a new attention mechanism: Hadamard attention mechanism. It improves the accuracy of medical image classification without greatly increasing the complexity of the model. Meanwhile, we adopt a simplified inception module to improve the utilization of parameters. We use two medical image datasets to prove the superiority of our proposed method. In the BreakHis dataset, the AUCs of our method can reach 98.74%, 98.38%, 98.61% and 97.67% under the magnification factors of 40×, 100×, 200× and 400×, respectively. The accuracies can reach 95.67%, 94.17%, 94.53% and 94.12% under the magnification factors of 40×, 100×, 200× and 400×, respectively. In the KIMIA Path 960 dataset, the AUCs and accuracy of our method can reach 99.91% and 99.03%. It is superior to the currently popular methods and can significantly improve the effectiveness of medical image classification.展开更多
Prediction,prevention,and control of forest fires are crucial on at all scales.Developing effective fire detection systems can aid in their control.This study proposes a novel CNN(convolutional neural network)using an...Prediction,prevention,and control of forest fires are crucial on at all scales.Developing effective fire detection systems can aid in their control.This study proposes a novel CNN(convolutional neural network)using an attention blocks module which combines an attention module with numerous input layers to enhance the performance of neural networks.The suggested model focuses on predicting the damage affected/burned areas due to possible wildfires and evaluating the multilateral interactions between the pertinent factors.The results show the impacts of CNN using attention blocks for feature extraction and to better understand how ecosystems are affected by meteorological factors.For selected meteorological data,RMSE 12.08 and MAE 7.45 values provide higher predictive power for selecting relevant and necessary features to provide optimal performance with less operational and computational costs.These findings show that the suggested strategy is reliable and effective for planning and managing fire-prone regions as well as for predicting forest fire damage.展开更多
The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conven...The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from ima...Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.展开更多
Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which ...Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.展开更多
Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed t...Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed toavoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structureof which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracyof our ANC methods on two datasets are 96.32% ± 1.06%, and 96.00% ± 1.03%, respectively. Conclusions: Thisproposed ANC method is superior to 9 state-of-the-art approaches.展开更多
Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and com...Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Indoor localization methods can help many sectors,such as healthcare centers,smart homes,museums,warehouses,and retail malls,improve their service areas.As a result,it is crucial to look for low-cost methods that can ...Indoor localization methods can help many sectors,such as healthcare centers,smart homes,museums,warehouses,and retail malls,improve their service areas.As a result,it is crucial to look for low-cost methods that can provide exact localization in indoor locations.In this context,imagebased localization methods can play an important role in estimating both the position and the orientation of cameras regarding an object.Image-based localization faces many issues,such as image scale and rotation variance.Also,image-based localization’s accuracy and speed(latency)are two critical factors.This paper proposes an efficient 6-DoF deep-learning model for image-based localization.This model incorporates the channel attention module and the Scale PyramidModule(SPM).It not only enhances accuracy but also ensures the model’s real-time performance.In complex scenes,a channel attention module is employed to distinguish between the textures of the foregrounds and backgrounds.Our model adapted an SPM,a feature pyramid module for dealing with image scale and rotation variance issues.Furthermore,the proposed model employs two regressions(two fully connected layers),one for position and the other for orientation,which increases outcome accuracy.Experiments on standard indoor and outdoor datasets show that the proposed model has a significantly lower Mean Squared Error(MSE)for both position and orientation.On the indoor 7-Scenes dataset,the MSE for the position is reduced to 0.19 m and 6.25°for the orientation.Furthermore,on the outdoor Cambridge landmarks dataset,the MSE for the position is reduced to 0.63 m and 2.03°for the orientation.According to the findings,the proposed approach is superior and more successful than the baseline methods.展开更多
Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information...Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information and image distortion,resulting in difficulties and obstacles to the extraction of key information,affecting the judgment of the real situation in the process of the Internet of Things,and causing system decision-making errors and accidents.In this paper,we mainly solve the problem of rain on the image occlusion,remove the rain grain in the image,and get a clear image without rain.Therefore,the single image deraining algorithm is studied,and a dual-branch network structure based on the attention module and convolutional neural network(CNN)module is proposed to accomplish the task of rain removal.In order to complete the rain removal of a single image with high quality,we apply the spatial attention module,channel attention module and CNN module to the network structure,and build the network using the coder-decoder structure.In the experiment,with the structural similarity(SSIM)and the peak signal-to-noise ratio(PSNR)as evaluation indexes,the training and testing results on the rain removal dataset show that the proposed structure has a good effect on the single image deraining task.展开更多
The judgment of gear failure is based on the pitting area ratio of gear.Traditional gear pitting calculation method mainly rely on manual visual inspection.This method is greatly affected by human factors,and is great...The judgment of gear failure is based on the pitting area ratio of gear.Traditional gear pitting calculation method mainly rely on manual visual inspection.This method is greatly affected by human factors,and is greatly affected by the working experience,training degree and fatigue degree of the detection personnel,so the detection results may be biased.The non-contact computer vision measurement can carry out non-destructive testing and monitoring under the working condition of the machine,and has high detection accuracy.To improve the measurement accuracy of gear pitting,a novel multi-scale splicing attention U-Net(MSSA U-Net)is explored in this study.An image splicing module is first proposed for concatenating the output feature maps of multiple convolutional layers into a splicing feature map with more semantic information.Then,an attention module is applied to select the key features of the splicing feature map.Given that MSSA U-Net adequately uses multi-scale semantic features,it has better segmentation performance on irregular small objects than U-Net and attention U-Net.On the basis of the designed visual detection platform and MSSA U-Net,a methodology for measuring the area ratio of gear pitting is proposed.With three datasets,experimental results show that MSSA U-Net is superior to existing typical image segmentation methods and can accurately segment different levels of pitting due to its strong segmentation ability.Therefore,the proposed methodology can be effectively applied in measuring the pitting area ratio and determining the level of gear pitting.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD)...Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.展开更多
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
文摘The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.
文摘Medical image classification has played an important role in the medical field, and the related method based on deep learning has become an important and powerful technique in medical image classification. In this article, we propose a simplified inception module based Hadamard attention (SI + HA) mechanism for medical image classification. Specifically, we propose a new attention mechanism: Hadamard attention mechanism. It improves the accuracy of medical image classification without greatly increasing the complexity of the model. Meanwhile, we adopt a simplified inception module to improve the utilization of parameters. We use two medical image datasets to prove the superiority of our proposed method. In the BreakHis dataset, the AUCs of our method can reach 98.74%, 98.38%, 98.61% and 97.67% under the magnification factors of 40×, 100×, 200× and 400×, respectively. The accuracies can reach 95.67%, 94.17%, 94.53% and 94.12% under the magnification factors of 40×, 100×, 200× and 400×, respectively. In the KIMIA Path 960 dataset, the AUCs and accuracy of our method can reach 99.91% and 99.03%. It is superior to the currently popular methods and can significantly improve the effectiveness of medical image classification.
文摘Prediction,prevention,and control of forest fires are crucial on at all scales.Developing effective fire detection systems can aid in their control.This study proposes a novel CNN(convolutional neural network)using an attention blocks module which combines an attention module with numerous input layers to enhance the performance of neural networks.The suggested model focuses on predicting the damage affected/burned areas due to possible wildfires and evaluating the multilateral interactions between the pertinent factors.The results show the impacts of CNN using attention blocks for feature extraction and to better understand how ecosystems are affected by meteorological factors.For selected meteorological data,RMSE 12.08 and MAE 7.45 values provide higher predictive power for selecting relevant and necessary features to provide optimal performance with less operational and computational costs.These findings show that the suggested strategy is reliable and effective for planning and managing fire-prone regions as well as for predicting forest fire damage.
基金supported in part by the Science and Technology Innovation Project of CHN Energy Shuo Huang Railway Development Company Ltd(No.SHTL-22-28)the Beijing Natural Science Foundation Fengtai Urban Rail Transit Frontier Research Joint Fund(No.L231002)the Major Project of China State Railway Group Co.,Ltd.(No.K2023T003)。
文摘The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
基金supported in part by the General Program Hunan Provincial Natural Science Foundation of 2022,China(2022JJ31022)the Undergraduate Education Reform Project of Hunan Province,China(HNJG-20210532)the National Natural Science Foundation of China(62276276)。
文摘Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.
基金the National Natural Science Foundation of China(No.61975015)the Research and Innovation Project for Graduate Students at Zhongyuan University of Technology(No.YKY2024ZK14).
文摘Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.
基金This paper is partially supported by Open Fund for Jiangsu Key Laboratory of Advanced Manufacturing Technology(HGAMTL-1703)Guangxi Key Laboratory of Trusted Software(kx201901)+5 种基金Fundamental Research Funds for the Central Universities(CDLS-2020-03)Key Laboratory of Child Development and Learning Science(Southeast University),Ministry of EducationRoyal Society International Exchanges Cost Share Award,UK(RP202G0230)Medical Research Council Confidence in Concept Award,UK(MC_PC_17171)Hope Foundation for Cancer Research,UK(RM60G0680)British Heart Foundation Accelerator Award,UK.
文摘Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed toavoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structureof which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracyof our ANC methods on two datasets are 96.32% ± 1.06%, and 96.00% ± 1.03%, respectively. Conclusions: Thisproposed ANC method is superior to 9 state-of-the-art approaches.
文摘Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金This work was funded by the Deanship of Scientific Research at Jouf University under grant No(DSR-2021-02-0379).
文摘Indoor localization methods can help many sectors,such as healthcare centers,smart homes,museums,warehouses,and retail malls,improve their service areas.As a result,it is crucial to look for low-cost methods that can provide exact localization in indoor locations.In this context,imagebased localization methods can play an important role in estimating both the position and the orientation of cameras regarding an object.Image-based localization faces many issues,such as image scale and rotation variance.Also,image-based localization’s accuracy and speed(latency)are two critical factors.This paper proposes an efficient 6-DoF deep-learning model for image-based localization.This model incorporates the channel attention module and the Scale PyramidModule(SPM).It not only enhances accuracy but also ensures the model’s real-time performance.In complex scenes,a channel attention module is employed to distinguish between the textures of the foregrounds and backgrounds.Our model adapted an SPM,a feature pyramid module for dealing with image scale and rotation variance issues.Furthermore,the proposed model employs two regressions(two fully connected layers),one for position and the other for orientation,which increases outcome accuracy.Experiments on standard indoor and outdoor datasets show that the proposed model has a significantly lower Mean Squared Error(MSE)for both position and orientation.On the indoor 7-Scenes dataset,the MSE for the position is reduced to 0.19 m and 6.25°for the orientation.Furthermore,on the outdoor Cambridge landmarks dataset,the MSE for the position is reduced to 0.63 m and 2.03°for the orientation.According to the findings,the proposed approach is superior and more successful than the baseline methods.
基金supported by the NationalNatural Science Foundation of China(No.62001272).
文摘Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information and image distortion,resulting in difficulties and obstacles to the extraction of key information,affecting the judgment of the real situation in the process of the Internet of Things,and causing system decision-making errors and accidents.In this paper,we mainly solve the problem of rain on the image occlusion,remove the rain grain in the image,and get a clear image without rain.Therefore,the single image deraining algorithm is studied,and a dual-branch network structure based on the attention module and convolutional neural network(CNN)module is proposed to accomplish the task of rain removal.In order to complete the rain removal of a single image with high quality,we apply the spatial attention module,channel attention module and CNN module to the network structure,and build the network using the coder-decoder structure.In the experiment,with the structural similarity(SSIM)and the peak signal-to-noise ratio(PSNR)as evaluation indexes,the training and testing results on the rain removal dataset show that the proposed structure has a good effect on the single image deraining task.
基金Supported by National Natural Science Foundation of China (Grant Nos.62033001 and 52175075)Chongqing Municipal Graduate Scientific Research and Innovation Foundation of China (Grant No.CYB21010)。
文摘The judgment of gear failure is based on the pitting area ratio of gear.Traditional gear pitting calculation method mainly rely on manual visual inspection.This method is greatly affected by human factors,and is greatly affected by the working experience,training degree and fatigue degree of the detection personnel,so the detection results may be biased.The non-contact computer vision measurement can carry out non-destructive testing and monitoring under the working condition of the machine,and has high detection accuracy.To improve the measurement accuracy of gear pitting,a novel multi-scale splicing attention U-Net(MSSA U-Net)is explored in this study.An image splicing module is first proposed for concatenating the output feature maps of multiple convolutional layers into a splicing feature map with more semantic information.Then,an attention module is applied to select the key features of the splicing feature map.Given that MSSA U-Net adequately uses multi-scale semantic features,it has better segmentation performance on irregular small objects than U-Net and attention U-Net.On the basis of the designed visual detection platform and MSSA U-Net,a methodology for measuring the area ratio of gear pitting is proposed.With three datasets,experimental results show that MSSA U-Net is superior to existing typical image segmentation methods and can accurately segment different levels of pitting due to its strong segmentation ability.Therefore,the proposed methodology can be effectively applied in measuring the pitting area ratio and determining the level of gear pitting.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
基金This work was supported by the National Natural Science Foundation of China(No.61906006).
文摘Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.