The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine ...The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.展开更多
Medical image classification has played an important role in the medical field, and the related method based on deep learning has become an important and powerful technique in medical image classification. In this art...Medical image classification has played an important role in the medical field, and the related method based on deep learning has become an important and powerful technique in medical image classification. In this article, we propose a simplified inception module based Hadamard attention (SI + HA) mechanism for medical image classification. Specifically, we propose a new attention mechanism: Hadamard attention mechanism. It improves the accuracy of medical image classification without greatly increasing the complexity of the model. Meanwhile, we adopt a simplified inception module to improve the utilization of parameters. We use two medical image datasets to prove the superiority of our proposed method. In the BreakHis dataset, the AUCs of our method can reach 98.74%, 98.38%, 98.61% and 97.67% under the magnification factors of 40×, 100×, 200× and 400×, respectively. The accuracies can reach 95.67%, 94.17%, 94.53% and 94.12% under the magnification factors of 40×, 100×, 200× and 400×, respectively. In the KIMIA Path 960 dataset, the AUCs and accuracy of our method can reach 99.91% and 99.03%. It is superior to the currently popular methods and can significantly improve the effectiveness of medical image classification.展开更多
Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain les...Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.展开更多
Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from ima...Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.展开更多
Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed t...Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed toavoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structureof which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracyof our ANC methods on two datasets are 96.32% ± 1.06%, and 96.00% ± 1.03%, respectively. Conclusions: Thisproposed ANC method is superior to 9 state-of-the-art approaches.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information...Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information and image distortion,resulting in difficulties and obstacles to the extraction of key information,affecting the judgment of the real situation in the process of the Internet of Things,and causing system decision-making errors and accidents.In this paper,we mainly solve the problem of rain on the image occlusion,remove the rain grain in the image,and get a clear image without rain.Therefore,the single image deraining algorithm is studied,and a dual-branch network structure based on the attention module and convolutional neural network(CNN)module is proposed to accomplish the task of rain removal.In order to complete the rain removal of a single image with high quality,we apply the spatial attention module,channel attention module and CNN module to the network structure,and build the network using the coder-decoder structure.In the experiment,with the structural similarity(SSIM)and the peak signal-to-noise ratio(PSNR)as evaluation indexes,the training and testing results on the rain removal dataset show that the proposed structure has a good effect on the single image deraining task.展开更多
The judgment of gear failure is based on the pitting area ratio of gear.Traditional gear pitting calculation method mainly rely on manual visual inspection.This method is greatly affected by human factors,and is great...The judgment of gear failure is based on the pitting area ratio of gear.Traditional gear pitting calculation method mainly rely on manual visual inspection.This method is greatly affected by human factors,and is greatly affected by the working experience,training degree and fatigue degree of the detection personnel,so the detection results may be biased.The non-contact computer vision measurement can carry out non-destructive testing and monitoring under the working condition of the machine,and has high detection accuracy.To improve the measurement accuracy of gear pitting,a novel multi-scale splicing attention U-Net(MSSA U-Net)is explored in this study.An image splicing module is first proposed for concatenating the output feature maps of multiple convolutional layers into a splicing feature map with more semantic information.Then,an attention module is applied to select the key features of the splicing feature map.Given that MSSA U-Net adequately uses multi-scale semantic features,it has better segmentation performance on irregular small objects than U-Net and attention U-Net.On the basis of the designed visual detection platform and MSSA U-Net,a methodology for measuring the area ratio of gear pitting is proposed.With three datasets,experimental results show that MSSA U-Net is superior to existing typical image segmentation methods and can accurately segment different levels of pitting due to its strong segmentation ability.Therefore,the proposed methodology can be effectively applied in measuring the pitting area ratio and determining the level of gear pitting.展开更多
Indoor localization methods can help many sectors,such as healthcare centers,smart homes,museums,warehouses,and retail malls,improve their service areas.As a result,it is crucial to look for low-cost methods that can ...Indoor localization methods can help many sectors,such as healthcare centers,smart homes,museums,warehouses,and retail malls,improve their service areas.As a result,it is crucial to look for low-cost methods that can provide exact localization in indoor locations.In this context,imagebased localization methods can play an important role in estimating both the position and the orientation of cameras regarding an object.Image-based localization faces many issues,such as image scale and rotation variance.Also,image-based localization’s accuracy and speed(latency)are two critical factors.This paper proposes an efficient 6-DoF deep-learning model for image-based localization.This model incorporates the channel attention module and the Scale PyramidModule(SPM).It not only enhances accuracy but also ensures the model’s real-time performance.In complex scenes,a channel attention module is employed to distinguish between the textures of the foregrounds and backgrounds.Our model adapted an SPM,a feature pyramid module for dealing with image scale and rotation variance issues.Furthermore,the proposed model employs two regressions(two fully connected layers),one for position and the other for orientation,which increases outcome accuracy.Experiments on standard indoor and outdoor datasets show that the proposed model has a significantly lower Mean Squared Error(MSE)for both position and orientation.On the indoor 7-Scenes dataset,the MSE for the position is reduced to 0.19 m and 6.25°for the orientation.Furthermore,on the outdoor Cambridge landmarks dataset,the MSE for the position is reduced to 0.63 m and 2.03°for the orientation.According to the findings,the proposed approach is superior and more successful than the baseline methods.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD)...Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.展开更多
The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target...The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.展开更多
The non-destructive recognition of coated seeds is crucial for advancing studies in coating theory.Currently,the recognition of coated seeds heavily relies on manual visual inspection and machine vision detection.Howe...The non-destructive recognition of coated seeds is crucial for advancing studies in coating theory.Currently,the recognition of coated seeds heavily relies on manual visual inspection and machine vision detection.However,these methods pose challenges such as high misclassification rates,low recognition efficiency,and elevated labor intensity.In response to the aforementioned challenges,this study leveraged deep learning techniques to develop a coated seed recognition model named YOLO-Coated Seeds Recognition(YOLO-CSR),aiming to address the challenges posed by coated seed recognition tasks.The experiment of this study mainly includes the following steps:First,a seed coating machine was set up to coat red clover seeds,resulting in three types of coated red clover seeds.Subsequently,by collecting images of the three types of coated seeds,a coated seed image dataset was further constructed.Then,the YOLOv5s was built,incorporating the Convolutional Block Attention Module(CBAM)into the model’s backbone to enhance its ability to learn features of coated seeds.Finally,the training results of YOLO-CSR were compared with those of other classical recognition models.The experimental results showed that YOLO-CSR achieved the best recognition performance on the self-built coated seed image dataset.The average precision(AP)for recognizing the three types of coated seeds reached 98.43%,97.91%,and 97.26%,with a mean average precision@0.5(mAP@0.5)of 97.87%.Compared to YOLOv5,YOLO-CSR showed a 1.18%improvement in mAP@0.5.Additionally,YOLO-CSR has a model size of only 14.9 MB,with an average recognition time(ART)of 10.1 ms and a frame per second(FPS)of 99.Experimental results prove that YOLO-CSR can accurately,efficiently,and rapidly recognize coated red clover seeds.The findings of this study provide technical support for the non-destructive recognition of spherical coated seeds.展开更多
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
文摘The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.
文摘Medical image classification has played an important role in the medical field, and the related method based on deep learning has become an important and powerful technique in medical image classification. In this article, we propose a simplified inception module based Hadamard attention (SI + HA) mechanism for medical image classification. Specifically, we propose a new attention mechanism: Hadamard attention mechanism. It improves the accuracy of medical image classification without greatly increasing the complexity of the model. Meanwhile, we adopt a simplified inception module to improve the utilization of parameters. We use two medical image datasets to prove the superiority of our proposed method. In the BreakHis dataset, the AUCs of our method can reach 98.74%, 98.38%, 98.61% and 97.67% under the magnification factors of 40×, 100×, 200× and 400×, respectively. The accuracies can reach 95.67%, 94.17%, 94.53% and 94.12% under the magnification factors of 40×, 100×, 200× and 400×, respectively. In the KIMIA Path 960 dataset, the AUCs and accuracy of our method can reach 99.91% and 99.03%. It is superior to the currently popular methods and can significantly improve the effectiveness of medical image classification.
文摘Effective small object detection is crucial in various applications including urban intelligent transportation and pedestrian detection.However,small objects are difficult to detect accurately because they contain less information.Many current methods,particularly those based on Feature Pyramid Network(FPN),address this challenge by leveraging multi-scale feature fusion.However,existing FPN-based methods often suffer from inadequate feature fusion due to varying resolutions across different layers,leading to suboptimal small object detection.To address this problem,we propose the Two-layerAttention Feature Pyramid Network(TA-FPN),featuring two key modules:the Two-layer Attention Module(TAM)and the Small Object Detail Enhancement Module(SODEM).TAM uses the attention module to make the network more focused on the semantic information of the object and fuse it to the lower layer,so that each layer contains similar semantic information,to alleviate the problem of small object information being submerged due to semantic gaps between different layers.At the same time,SODEM is introduced to strengthen the local features of the object,suppress background noise,enhance the information details of the small object,and fuse the enhanced features to other feature layers to ensure that each layer is rich in small object information,to improve small object detection accuracy.Our extensive experiments on challenging datasets such as Microsoft Common Objects inContext(MSCOCO)and Pattern Analysis Statistical Modelling and Computational Learning,Visual Object Classes(PASCAL VOC)demonstrate the validity of the proposedmethod.Experimental results show a significant improvement in small object detection accuracy compared to state-of-theart detectors.
基金supported in part by the General Program Hunan Provincial Natural Science Foundation of 2022,China(2022JJ31022)the Undergraduate Education Reform Project of Hunan Province,China(HNJG-20210532)the National Natural Science Foundation of China(62276276)。
文摘Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.
基金This paper is partially supported by Open Fund for Jiangsu Key Laboratory of Advanced Manufacturing Technology(HGAMTL-1703)Guangxi Key Laboratory of Trusted Software(kx201901)+5 种基金Fundamental Research Funds for the Central Universities(CDLS-2020-03)Key Laboratory of Child Development and Learning Science(Southeast University),Ministry of EducationRoyal Society International Exchanges Cost Share Award,UK(RP202G0230)Medical Research Council Confidence in Concept Award,UK(MC_PC_17171)Hope Foundation for Cancer Research,UK(RM60G0680)British Heart Foundation Accelerator Award,UK.
文摘Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed toavoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structureof which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracyof our ANC methods on two datasets are 96.32% ± 1.06%, and 96.00% ± 1.03%, respectively. Conclusions: Thisproposed ANC method is superior to 9 state-of-the-art approaches.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金supported by the NationalNatural Science Foundation of China(No.62001272).
文摘Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information and image distortion,resulting in difficulties and obstacles to the extraction of key information,affecting the judgment of the real situation in the process of the Internet of Things,and causing system decision-making errors and accidents.In this paper,we mainly solve the problem of rain on the image occlusion,remove the rain grain in the image,and get a clear image without rain.Therefore,the single image deraining algorithm is studied,and a dual-branch network structure based on the attention module and convolutional neural network(CNN)module is proposed to accomplish the task of rain removal.In order to complete the rain removal of a single image with high quality,we apply the spatial attention module,channel attention module and CNN module to the network structure,and build the network using the coder-decoder structure.In the experiment,with the structural similarity(SSIM)and the peak signal-to-noise ratio(PSNR)as evaluation indexes,the training and testing results on the rain removal dataset show that the proposed structure has a good effect on the single image deraining task.
基金Supported by National Natural Science Foundation of China (Grant Nos.62033001 and 52175075)Chongqing Municipal Graduate Scientific Research and Innovation Foundation of China (Grant No.CYB21010)。
文摘The judgment of gear failure is based on the pitting area ratio of gear.Traditional gear pitting calculation method mainly rely on manual visual inspection.This method is greatly affected by human factors,and is greatly affected by the working experience,training degree and fatigue degree of the detection personnel,so the detection results may be biased.The non-contact computer vision measurement can carry out non-destructive testing and monitoring under the working condition of the machine,and has high detection accuracy.To improve the measurement accuracy of gear pitting,a novel multi-scale splicing attention U-Net(MSSA U-Net)is explored in this study.An image splicing module is first proposed for concatenating the output feature maps of multiple convolutional layers into a splicing feature map with more semantic information.Then,an attention module is applied to select the key features of the splicing feature map.Given that MSSA U-Net adequately uses multi-scale semantic features,it has better segmentation performance on irregular small objects than U-Net and attention U-Net.On the basis of the designed visual detection platform and MSSA U-Net,a methodology for measuring the area ratio of gear pitting is proposed.With three datasets,experimental results show that MSSA U-Net is superior to existing typical image segmentation methods and can accurately segment different levels of pitting due to its strong segmentation ability.Therefore,the proposed methodology can be effectively applied in measuring the pitting area ratio and determining the level of gear pitting.
基金This work was funded by the Deanship of Scientific Research at Jouf University under grant No(DSR-2021-02-0379).
文摘Indoor localization methods can help many sectors,such as healthcare centers,smart homes,museums,warehouses,and retail malls,improve their service areas.As a result,it is crucial to look for low-cost methods that can provide exact localization in indoor locations.In this context,imagebased localization methods can play an important role in estimating both the position and the orientation of cameras regarding an object.Image-based localization faces many issues,such as image scale and rotation variance.Also,image-based localization’s accuracy and speed(latency)are two critical factors.This paper proposes an efficient 6-DoF deep-learning model for image-based localization.This model incorporates the channel attention module and the Scale PyramidModule(SPM).It not only enhances accuracy but also ensures the model’s real-time performance.In complex scenes,a channel attention module is employed to distinguish between the textures of the foregrounds and backgrounds.Our model adapted an SPM,a feature pyramid module for dealing with image scale and rotation variance issues.Furthermore,the proposed model employs two regressions(two fully connected layers),one for position and the other for orientation,which increases outcome accuracy.Experiments on standard indoor and outdoor datasets show that the proposed model has a significantly lower Mean Squared Error(MSE)for both position and orientation.On the indoor 7-Scenes dataset,the MSE for the position is reduced to 0.19 m and 6.25°for the orientation.Furthermore,on the outdoor Cambridge landmarks dataset,the MSE for the position is reduced to 0.63 m and 2.03°for the orientation.According to the findings,the proposed approach is superior and more successful than the baseline methods.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
基金This work was supported by the National Natural Science Foundation of China(No.61906006).
文摘Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.
基金This work was funded by the China Postdoctoral Science Foundation(No.2019M661319)Heilongjiang Postdoctoral Scientific Research Developmental Foundation(No.LBH-Q17042)+1 种基金Fundamental Research Funds for the Central Universities(3072020CFQ0602,3072020CF0604,3072020CFP0601)2019 Industrial Internet Innovation and Development Engineering(KY1060020002,KY10600200008).
文摘The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.
基金funded by the National Key Research and Development Program of China (Grant No.2022YFF1302300)the Key R&D and Achievement Transformation Plan Project of Inner Mongolia (Grant No.2023YFDZ0006)+1 种基金the Program for Improving the Scientific Research Ability of Youth Teachers of Inner Mongolia Agricultural University (Grant No.BR220128)the Research Program of science and technology at Universities of Inner Mongolia Autonomous Region (Grant No.NJZZ23046).
文摘The non-destructive recognition of coated seeds is crucial for advancing studies in coating theory.Currently,the recognition of coated seeds heavily relies on manual visual inspection and machine vision detection.However,these methods pose challenges such as high misclassification rates,low recognition efficiency,and elevated labor intensity.In response to the aforementioned challenges,this study leveraged deep learning techniques to develop a coated seed recognition model named YOLO-Coated Seeds Recognition(YOLO-CSR),aiming to address the challenges posed by coated seed recognition tasks.The experiment of this study mainly includes the following steps:First,a seed coating machine was set up to coat red clover seeds,resulting in three types of coated red clover seeds.Subsequently,by collecting images of the three types of coated seeds,a coated seed image dataset was further constructed.Then,the YOLOv5s was built,incorporating the Convolutional Block Attention Module(CBAM)into the model’s backbone to enhance its ability to learn features of coated seeds.Finally,the training results of YOLO-CSR were compared with those of other classical recognition models.The experimental results showed that YOLO-CSR achieved the best recognition performance on the self-built coated seed image dataset.The average precision(AP)for recognizing the three types of coated seeds reached 98.43%,97.91%,and 97.26%,with a mean average precision@0.5(mAP@0.5)of 97.87%.Compared to YOLOv5,YOLO-CSR showed a 1.18%improvement in mAP@0.5.Additionally,YOLO-CSR has a model size of only 14.9 MB,with an average recognition time(ART)of 10.1 ms and a frame per second(FPS)of 99.Experimental results prove that YOLO-CSR can accurately,efficiently,and rapidly recognize coated red clover seeds.The findings of this study provide technical support for the non-destructive recognition of spherical coated seeds.