For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The mi...For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The migration learning algorithm is used to pre-train the convolutional layer parameters and mitigate the overfitting caused by the insufficient number of training samples.The designed MCA module is integrated into the ResNet18 backbone network.The attention mechanism highlights important information and suppresses irrelevant information by assigning different coefficients or weights,and the multi-head structure focuses more on the local features of the pictures,which improves the efficiency of facial expression recognition.Experimental results demonstrate that the model proposed in this paper achieves excellent recognition results in Fer2013,CK+and Jaffe datasets,with accuracy rates of 72.7%,98.8%and 93.33%,respectively.展开更多
In many existing multi-view gait recognition methods based on images or video sequences,gait sequences are usually used to superimpose and synthesize images and construct energy-like template.However,information may b...In many existing multi-view gait recognition methods based on images or video sequences,gait sequences are usually used to superimpose and synthesize images and construct energy-like template.However,information may be lost during the process of compositing image and capture EMG signals.Errors and the recognition accuracy may be introduced and affected respectively by some factors such as period detection.To better solve the problems,a multi-view gait recognition method using deep convolutional neural network and channel attention mechanism is proposed.Firstly,the sliding time window method is used to capture EMG signals.Then,the back-propagation learning algorithm is used to train each layer of convolution,which improves the learning ability of the convolutional neural network.Finally,the channel attention mechanism is integrated into the neural network,which will improve the ability of expressing gait features.And a classifier is used to classify gait.As can be shown from experimental results on two public datasets,OULP and CASIA-B,the recognition rate of the proposed method can be achieved at 88.44%and 97.25%respectively.As can be shown from the comparative experimental results,the proposed method has better recognition effect than several other newer convolutional neural network methods.Therefore,the combination of convolutional neural network and channel attention mechanism is of great value for gait recognition.展开更多
Convolutional neural networks(CNNs) have shown great potential for image super-resolution(SR).However,most existing CNNs only reconstruct images in the spatial domain,resulting in insufficient high-frequency details o...Convolutional neural networks(CNNs) have shown great potential for image super-resolution(SR).However,most existing CNNs only reconstruct images in the spatial domain,resulting in insufficient high-frequency details of reconstructed images.To address this issue,a channel attention based wavelet cascaded network for image super-resolution(CWSR) is proposed.Specifically,a second-order channel attention(SOCA) mechanism is incorporated into the network,and the covariance matrix normalization is utilized to explore interdependencies between channel-wise features.Then,to boost the quality of residual features,the non-local module is adopted to further improve the global information integration ability of the network.Finally,taking the image loss in the spatial and wavelet domains into account,a dual-constrained loss function is proposed to optimize the network.Experimental results illustrate that CWSR outperforms several state-of-the-art methods in terms of both visual quality and quantitative metrics.展开更多
Image deraining is a highly ill-posed problem.Although significant progress has been made due to the use of deep convolutional neural networks,this problem still remains challenging,especially for the details restorat...Image deraining is a highly ill-posed problem.Although significant progress has been made due to the use of deep convolutional neural networks,this problem still remains challenging,especially for the details restoration and generalization to real rain images.In this paper,we propose a deep residual channel attention network(DeRCAN)for deraining.The channel attention mechanism is able to capture the inherent properties of the feature space and thus facilitates more accurate estimations of structures and details for image deraining.In addition,we further propose an unsupervised learning approach to better solve real rain images based on the proposed network.Extensive qualitative and quantitative evaluation results on both synthetic and real-world images demonstrate that the proposed DeRCAN performs favorably against state-of-the-art methods.展开更多
Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior know...Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior knowledge of illumination patterns and signal-to-noise ratio(SNR)of raw images.To obtain high-quality SR images,several raw images need to be captured under high fluorescence level,which further restricts SIM’s temporal resolution and its applications.Deep learning(DL)is a data-driven technology that has been used to expand the limits of optical microscopy.In this study,we propose a deep neural network based on multi-level wavelet and attention mechanism(MWAM)for SIM.Our results show that the MWAM network can extract high-frequency information contained in SIM raw images and accurately integrate it into the output image,resulting in superior SR images compared to those generated using wide-field images as input data.We also demonstrate that the number of SIM raw images can be reduced to three,with one image in each illumination orientation,to achieve the optimal tradeoff between temporal and spatial resolution.Furthermore,our MWAM network exhibits superior reconstruction ability on low-SNR images compared to conventional SIM algorithms.We have also analyzed the adaptability of this network on other biological samples and successfully applied the pretrained model to other SIM systems.展开更多
Alzheimer’s disease(AD)is a complex,progressive neurodegenerative disorder.The subtle and insidious onset of its pathogenesis makes early detection of a formidable challenge in both contemporary neuroscience and clin...Alzheimer’s disease(AD)is a complex,progressive neurodegenerative disorder.The subtle and insidious onset of its pathogenesis makes early detection of a formidable challenge in both contemporary neuroscience and clinical practice.In this study,we introduce an advanced diagnostic methodology rooted in theMed-3D transfermodel and enhanced with an attention mechanism.We aim to improve the precision of AD diagnosis and facilitate its early identification.Initially,we employ a spatial normalization technique to address challenges like clarity degradation and unsaturation,which are commonly observed in imaging datasets.Subsequently,an attention mechanism is incorporated to selectively focus on the salient features within the imaging data.Building upon this foundation,we present the novelMed-3D transfermodel,designed to further elucidate and amplify the intricate features associated withADpathogenesis.Our proposedmodel has demonstrated promising results,achieving a classification accuracy of 92%.To emphasize the robustness and practicality of our approach,we introduce an adaptive‘hot-updating’auxiliary diagnostic system.This system not only enables continuous model training and optimization but also provides a dynamic platform to meet the real-time diagnostic and therapeutic demands of AD.展开更多
In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference ima...In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference image guided super-resolution(RefSR)is an effective strategy to boost the SR(super-resolution)performance.In RefSR,the introduced high-resolution(HR)references can facilitate the high-frequency residual prediction process.According to the best of our knowledge,the existing CNN-based RefSR methods treat the features from the references and the low-resolution(LR)input equally by simply concatenating them together.However,the HR references and the LR inputs contribute differently to the final SR results.Therefore,we propose a progressive channel attention network(PCANet)for RefSR.There are two technical contributions in this paper.First,we propose a novel channel attention module(CAM),which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging.Second,considering that the residual prediction process can be improved when the LR input is enriched with more details,we perform super-resolution progressively,which can take advantage of the reference images in multi-scales.Extensive quantitative and qualitative evaluations on three benchmark datasets,which represent three typical scenarios for RefSR,demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR(Peak Signal-to-Noise Ratio)and SSIM(Structural Similarity).展开更多
This research developed a hybrid position-channel network (named PCNet) through incorporating newly designed channel and position attention modules into U-Net to alleviate the crack discontinuity problem in channel an...This research developed a hybrid position-channel network (named PCNet) through incorporating newly designed channel and position attention modules into U-Net to alleviate the crack discontinuity problem in channel and spatial dimensions. In PCNet, the U-Net is used as a baseline to extract informative spatial and channel-wise features from shield tunnel lining crack images. A channel and a position attention module are designed and embedded after each convolution layer of U-Net to model the feature interdependencies in channel and spatial dimensions. These attention modules can make the U-Net adaptively integrate local crack features with their global dependencies. Experiments were conducted utilizing the dataset based on the images from Shanghai metro shield tunnels. The results validate the effectiveness of the designed channel and position attention modules, since they can individually increase balanced accuracy (BA) by 11.25% and 12.95%, intersection over union (IoU) by 10.79% and 11.83%, and F1 score by 9.96% and 10.63%, respectively. In comparison with the state-of-the-art models (i.e. LinkNet, PSPNet, U-Net, PANet, and Mask R–CNN) on the testing dataset, the proposed PCNet outperforms others with an improvement of BA, IoU, and F1 score owing to the implementation of the channel and position attention modules. These evaluation metrics indicate that the proposed PCNet presents refined crack segmentation with improved performance and is a practicable approach to segment shield tunnel lining cracks in field practice.展开更多
The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sa...The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sampling rate,how to model longsequence data and make rational use of the relevant information between channels is also an urgent problem to be solved.In order to solve the above problems,the performance of the end-to-end music separation algorithm is enhanced by improving the network structure.Our main contributions include the following:(1)A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music,such as main melody,tone and so on.(2)On this basis,the multi-head attention and dualpath transformer are introduced in the separation module.Channel attention units are applied recursively on the feature map of each layer of the network,enabling the network to perform long-sequence separation.Experimental results show that after the introduction of the channel attention,the performance of the proposed algorithm has a stable improvement compared with the baseline system.On the MUSDB18 dataset,the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain(T-F domain).展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
An end-to-end channel attention and pixel attention network(CP-Net)is proposed to produce dehazed image directly in the paper.The CP-Net structure contains three critical components.Firstly,the double attention(DA)mod...An end-to-end channel attention and pixel attention network(CP-Net)is proposed to produce dehazed image directly in the paper.The CP-Net structure contains three critical components.Firstly,the double attention(DA)module consisting of channel attention(CA)and pixel attention(PA).Different channel features contain different levels of important information,and CA can give more weight to relevant information,so the network can learn more useful information.Meanwhile,haze is unevenly distributed on different pixels,and PA is able to filter out haze with varying weights for different pixels.It sums the outputs of the two attention modules to improve further feature representation which contributes to better dehazing result.Secondly,local residual learning and DA module constitute another important component,namely basic block structure.Local residual learning can transfer the feature information in the shallow part of the network to the deep part of the network through multiple local residual connections and enhance the expressive ability of CP-Net.Thirdly,CP-Net mainly uses its core component,DA module,to automatically assign different weights to different features to achieve satisfactory dehazing effect.The experiment results on synthetic datasets and real hazy images indicate that many state-of-the-art single image dehazing methods have been surpassed by the CP-Net both quantitatively and qualitatively.展开更多
Vehicle detection plays a crucial role in the field of autonomous driving technology.However,directly applying deep learning-based object detection algorithms to complex road scene images often leads to subpar perform...Vehicle detection plays a crucial role in the field of autonomous driving technology.However,directly applying deep learning-based object detection algorithms to complex road scene images often leads to subpar performance and slow inference speeds in vehicle detection.Achieving a balance between accuracy and detection speed is crucial for real-time object detection in real-world road scenes.This paper proposes a high-precision and fast vehicle detector called the feature-guided bidirectional pyramid network(FBPN).Firstly,to tackle challenges like vehicle occlusion and significant background interference,the efficient feature filtering module(EFFM)is introduced into the deep network,which amplifies the disparities between the features of the vehicle and the background.Secondly,the proposed global attention localization module(GALM)in the model neck effectively perceives the detailed position information of the target,improving both the accuracy and inference speed of themodel.Finally,the detection accuracy of small-scale vehicles is further enhanced through the utilization of a four-layer feature pyramid structure.Experimental results show that FBPN achieves an average precision of 60.8% and 97.8% on the BDD100K and KITTI datasets,respectively,with inference speeds reaching 344.83 frames/s and 357.14 frames/s.FBPN demonstrates its effectiveness and superiority by striking a balance between detection accuracy and inference speed,outperforming several state-of-the-art methods.展开更多
Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,...Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
Network intrusion detection systems(NIDS)based on deep learning have continued to make significant advances.However,the following challenges remain:on the one hand,simply applying only Temporal Convolutional Networks(...Network intrusion detection systems(NIDS)based on deep learning have continued to make significant advances.However,the following challenges remain:on the one hand,simply applying only Temporal Convolutional Networks(TCNs)can lead to models that ignore the impact of network traffic features at different scales on the detection performance.On the other hand,some intrusion detection methods considermulti-scale information of traffic data,but considering only forward network traffic information can lead to deficiencies in capturing multi-scale temporal features.To address both of these issues,we propose a hybrid Convolutional Neural Network that supports a multi-output strategy(BONUS)for industrial internet intrusion detection.First,we create a multiscale Temporal Convolutional Network by stacking TCN of different scales to capture the multiscale information of network traffic.Meanwhile,we propose a bi-directional structure and dynamically set the weights to fuse the forward and backward contextual information of network traffic at each scale to enhance the model’s performance in capturing the multi-scale temporal features of network traffic.In addition,we introduce a gated network for each of the two branches in the proposed method to assist the model in learning the feature representation of each branch.Extensive experiments reveal the effectiveness of the proposed approach on two publicly available traffic intrusion detection datasets named UNSW-NB15 and NSL-KDD with F1 score of 85.03% and 99.31%,respectively,which also validates the effectiveness of enhancing the model’s ability to capture multi-scale temporal features of traffic data on detection performance.展开更多
Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious oper...Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious operations,such as destroying evidence. Therefore, detection and localization of imageinpainting operations are essential. Recent research shows that high-pass filteringfull convolutional network (HPFCN) is applied to image inpainting detection andachieves good results. However, those methods did not consider the spatial location and channel information of the feature map. To solve these shortcomings, weintroduce the squeezed excitation blocks (SE) and propose a high-pass filter attention full convolutional network (HPACN). In feature extraction, we apply concurrent spatial and channel attention (scSE) to enhance feature extraction and obtainmore information. Channel attention (cSE) is introduced in upsampling toenhance detection and localization. The experimental results show that the proposed method can achieve improvement on ImageNet.展开更多
Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only cha...Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only channel or spatial information,and cannot make full use of both channel and spatial information to improve SISR performance further.The present work addresses this problem by proposing a mixed attention densely residual network architecture that can make full and simultaneous use of both channel and spatial information.Specifically,we propose a residual in dense network structure composed of dense connections between multiple dense residual groups to form a very deep network.This structure allows each dense residual group to apply a local residual skip connection and enables the cascading of multiple residual blocks to reuse previous features.A mixed attention module is inserted into each dense residual group,to enable the algorithm to fuse channel attention with laplacian spatial attention effectively,and thereby more adaptively focus on valuable feature learning.The qualitative and quantitative results of extensive experiments have demonstrate that the proposed method has a comparable performance with other stateof-the-art methods.展开更多
Cataract is the leading cause of visual impairment globally.The scarcity and uneven distribution of ophthalmologists seriously hinder early visual impairment grading for cataract patients in the clin-ic.In this study,...Cataract is the leading cause of visual impairment globally.The scarcity and uneven distribution of ophthalmologists seriously hinder early visual impairment grading for cataract patients in the clin-ic.In this study,a deep learning-based automated grading system of visual impairment in cataract patients is proposed using a multi-scale efficient channel attention convolutional neural network(MECA_CNN).First,the efficient channel attention mechanism is applied in the MECA_CNN to extract multi-scale features of fundus images,which can effectively focus on lesion-related regions.Then,the asymmetric convolutional modules are embedded in the residual unit to reduce the infor-mation loss of fine-grained features in fundus images.In addition,the asymmetric loss function is applied to address the problem of a higher false-negative rate and weak generalization ability caused by the imbalanced dataset.A total of 7299 fundus images derived from two clinical centers are em-ployed to develop and evaluate the MECA_CNN for identifying mild visual impairment caused by cataract(MVICC),moderate to severe visual impairment caused by cataract(MSVICC),and nor-mal sample.The experimental results demonstrate that the MECA_CNN provides clinically meaning-ful performance for visual impairment grading in the internal test dataset:MVICC(accuracy,sensi-tivity,and specificity;91.3%,89.9%,and 92%),MSVICC(93.2%,78.5%,and 96.7%),and normal sample(98.1%,98.0%,and 98.1%).The comparable performance in the external test dataset is achieved,further verifying the effectiveness and generalizability of the MECA_CNN model.This study provides a deep learning-based practical system for the automated grading of visu-al impairment in cataract patients,facilitating the formulation of treatment strategies in a timely man-ner and improving patients’vision prognosis.展开更多
The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image s...The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image segmentation.However,two factors limit the development of this field.On the one hand,the individual pigs are easy to stick together,and the occlusion of debris such as pigpens can easily make the model misjudgment.On the other hand,manual labeling of group-raised pig data is time-consuming and labor-intensive and is prone to labeling errors.Therefore,it is urgent for an individual pig image segmentation model that can perform well in individual scenarios and can be easily migrated to a group-raised environment.In order to solve the above problems,taking individual pigs as research objects,an individual pig image segmentation dataset containing 2066 images was constructed,and a series of algorithms based on fully convolutional networks were proposed to solve the pig image segmentation problem.In order to capture the long-range dependencies and weaken the background information such as pigpens while enhancing the information of individual parts of pigs,the channel and spatial attention blocks were introduced into the best-performing decoders UNet and LinkNet.Experiments show that using ResNext50 as the encoder and Unet as the decoder as the basic model,adding two attention blocks at the same time achieves 98.30%and 96.71%on the F1 and IOU metrics,respectively.Compared with the model adding channel attention block alone,the two metrics are improved by 0.13%and 0.22%,respectively.The experiment of introducing channel and spatial attention alone shows that spatial attention is more effective than channel attention.Taking VGG16-LinkNet as an example,compared with channel attention,spatial attention improves the F1 and IOU metrics by 0.16%and 0.30%,respectively.Furthermore,the heatmap of the feature of different layers of the decoder after adding different attention information proves that with the increase of layers,the boundary of pig image segmentation is clearer.In order to verify the effectiveness of the individual pig image segmentation model in group-raised scenes,the transfer performance of the model is verified in three scenarios of high separation,deep adhesion,and pigpen occlusion.The experiments show that the segmentation results of adding attention information,especially the simultaneous fusion of channel and spatial attention blocks,are more refined and complete.The attention-based individual pig image segmentation model can be effectively transferred to the field of group-raised pigs and can provide a reference for its pre-segmentation.展开更多
With the improvement of image editing technology,the threshold of image tampering technology decreases,which leads to a decrease in the authenticity of image content.This has also driven research on image forgery dete...With the improvement of image editing technology,the threshold of image tampering technology decreases,which leads to a decrease in the authenticity of image content.This has also driven research on image forgery detection techniques.In this paper,a U-Net with multiple sensory field feature extraction(MSCU-Net)for image forgery detection is proposed.The proposed MSCU-Net is an end-to-end image essential attribute segmentation network that can perform image forgery detection without any pre-processing or post-processing.MSCU-Net replaces the single-scale convolution module in the original network with an improved multiple perceptual field convolution module so that the decoder can synthesize the features of different perceptual fields use residual propagation and residual feedback to recall the input feature information and consolidate the input feature information to make the difference in image attributes between the untampered and tampered regions more obvious,and introduce the channel coordinate confusion attention mechanism(CCCA)in skip-connection to further improve the segmentation accuracy of the network.In this paper,extensive experiments are conducted on various mainstream datasets,and the results verify the effectiveness of the proposed method,which outperforms the state-of-the-art image forgery detection methods.展开更多
基金funded by Anhui Province Quality Engineering Project No.2021jyxm0801Natural Science Foundation of Anhui University of Chinese Medicine under Grant Nos.2020zrzd18,2019zrzd11+1 种基金Humanity Social Science foundation Grants 2021rwzd20,2020rwzd07Anhui University of Chinese Medicine Quality Engineering Projects No.2021zlgc046.
文摘For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The migration learning algorithm is used to pre-train the convolutional layer parameters and mitigate the overfitting caused by the insufficient number of training samples.The designed MCA module is integrated into the ResNet18 backbone network.The attention mechanism highlights important information and suppresses irrelevant information by assigning different coefficients or weights,and the multi-head structure focuses more on the local features of the pictures,which improves the efficiency of facial expression recognition.Experimental results demonstrate that the model proposed in this paper achieves excellent recognition results in Fer2013,CK+and Jaffe datasets,with accuracy rates of 72.7%,98.8%and 93.33%,respectively.
基金This work was supported by the Natural Science Foundation of China(No.61902133)Fujian natural science foundation project(No.2018J05106)Xiamen Collaborative Innovation projects of Produces study grinds(3502Z20173046)。
文摘In many existing multi-view gait recognition methods based on images or video sequences,gait sequences are usually used to superimpose and synthesize images and construct energy-like template.However,information may be lost during the process of compositing image and capture EMG signals.Errors and the recognition accuracy may be introduced and affected respectively by some factors such as period detection.To better solve the problems,a multi-view gait recognition method using deep convolutional neural network and channel attention mechanism is proposed.Firstly,the sliding time window method is used to capture EMG signals.Then,the back-propagation learning algorithm is used to train each layer of convolution,which improves the learning ability of the convolutional neural network.Finally,the channel attention mechanism is integrated into the neural network,which will improve the ability of expressing gait features.And a classifier is used to classify gait.As can be shown from experimental results on two public datasets,OULP and CASIA-B,the recognition rate of the proposed method can be achieved at 88.44%and 97.25%respectively.As can be shown from the comparative experimental results,the proposed method has better recognition effect than several other newer convolutional neural network methods.Therefore,the combination of convolutional neural network and channel attention mechanism is of great value for gait recognition.
基金Supported by the National Natural Science Foundation of China(No.61901183)Fundamental Research Funds for the Central Universities(No.ZQN921)+4 种基金Natural Science Foundation of Fujian Province Science and Technology Department(No.2021H6037)Key Project of Quanzhou Science and Technology Plan(No.2021C008R)Natural Science Foundation of Fujian Province(No.2019J01010561)Education and Scientific Research Project for Young and Middle-aged Teachers of Fujian Province 2019(No.JAT191080)Science and Technology Bureau of Quanzhou(No.2017G046)。
文摘Convolutional neural networks(CNNs) have shown great potential for image super-resolution(SR).However,most existing CNNs only reconstruct images in the spatial domain,resulting in insufficient high-frequency details of reconstructed images.To address this issue,a channel attention based wavelet cascaded network for image super-resolution(CWSR) is proposed.Specifically,a second-order channel attention(SOCA) mechanism is incorporated into the network,and the covariance matrix normalization is utilized to explore interdependencies between channel-wise features.Then,to boost the quality of residual features,the non-local module is adopted to further improve the global information integration ability of the network.Finally,taking the image loss in the spatial and wavelet domains into account,a dual-constrained loss function is proposed to optimize the network.Experimental results illustrate that CWSR outperforms several state-of-the-art methods in terms of both visual quality and quantitative metrics.
基金supported by the National Key Research and Development Program of China under Grant No.2018AAA0102001the Fundamental Research Funds for the Central Universities of China under Grant No.30920041109.
文摘Image deraining is a highly ill-posed problem.Although significant progress has been made due to the use of deep convolutional neural networks,this problem still remains challenging,especially for the details restoration and generalization to real rain images.In this paper,we propose a deep residual channel attention network(DeRCAN)for deraining.The channel attention mechanism is able to capture the inherent properties of the feature space and thus facilitates more accurate estimations of structures and details for image deraining.In addition,we further propose an unsupervised learning approach to better solve real rain images based on the proposed network.Extensive qualitative and quantitative evaluation results on both synthetic and real-world images demonstrate that the proposed DeRCAN performs favorably against state-of-the-art methods.
基金supported by the National Natural Science Foundation of China(Grant Nos.62005307 and 61975228).
文摘Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior knowledge of illumination patterns and signal-to-noise ratio(SNR)of raw images.To obtain high-quality SR images,several raw images need to be captured under high fluorescence level,which further restricts SIM’s temporal resolution and its applications.Deep learning(DL)is a data-driven technology that has been used to expand the limits of optical microscopy.In this study,we propose a deep neural network based on multi-level wavelet and attention mechanism(MWAM)for SIM.Our results show that the MWAM network can extract high-frequency information contained in SIM raw images and accurately integrate it into the output image,resulting in superior SR images compared to those generated using wide-field images as input data.We also demonstrate that the number of SIM raw images can be reduced to three,with one image in each illumination orientation,to achieve the optimal tradeoff between temporal and spatial resolution.Furthermore,our MWAM network exhibits superior reconstruction ability on low-SNR images compared to conventional SIM algorithms.We have also analyzed the adaptability of this network on other biological samples and successfully applied the pretrained model to other SIM systems.
基金funded by the National Natural Science Foundation of China(No.62076044)Scientific Research Foundation of Chongqing University of Technology(No.2020ZDZ015).
文摘Alzheimer’s disease(AD)is a complex,progressive neurodegenerative disorder.The subtle and insidious onset of its pathogenesis makes early detection of a formidable challenge in both contemporary neuroscience and clinical practice.In this study,we introduce an advanced diagnostic methodology rooted in theMed-3D transfermodel and enhanced with an attention mechanism.We aim to improve the precision of AD diagnosis and facilitate its early identification.Initially,we employ a spatial normalization technique to address challenges like clarity degradation and unsaturation,which are commonly observed in imaging datasets.Subsequently,an attention mechanism is incorporated to selectively focus on the salient features within the imaging data.Building upon this foundation,we present the novelMed-3D transfermodel,designed to further elucidate and amplify the intricate features associated withADpathogenesis.Our proposedmodel has demonstrated promising results,achieving a classification accuracy of 92%.To emphasize the robustness and practicality of our approach,we introduce an adaptive‘hot-updating’auxiliary diagnostic system.This system not only enables continuous model training and optimization but also provides a dynamic platform to meet the real-time diagnostic and therapeutic demands of AD.
基金This work was supported in part by the National Natural Science Foundation of China under Grant Nos.61672378,61771339,and 61520106002.
文摘In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference image guided super-resolution(RefSR)is an effective strategy to boost the SR(super-resolution)performance.In RefSR,the introduced high-resolution(HR)references can facilitate the high-frequency residual prediction process.According to the best of our knowledge,the existing CNN-based RefSR methods treat the features from the references and the low-resolution(LR)input equally by simply concatenating them together.However,the HR references and the LR inputs contribute differently to the final SR results.Therefore,we propose a progressive channel attention network(PCANet)for RefSR.There are two technical contributions in this paper.First,we propose a novel channel attention module(CAM),which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging.Second,considering that the residual prediction process can be improved when the LR input is enriched with more details,we perform super-resolution progressively,which can take advantage of the reference images in multi-scales.Extensive quantitative and qualitative evaluations on three benchmark datasets,which represent three typical scenarios for RefSR,demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR(Peak Signal-to-Noise Ratio)and SSIM(Structural Similarity).
基金support from the Ministry of Science and Tech-nology of the:People's Republic of China(Grant No.2021 YFB2600804)the Open Research Project Programme of the State Key Labor atory of Interet of Things for Smart City(University of Macao)(Grant No.SKL-IoTSC(UM)-2021-2023/ORPF/A19/2022)the General Research Fund(GRF)project(Grant No.15214722)from Research Grants Council(RGC)of Hong Kong Special Administrative Re gion Government of China are gratefully acknowledged.
文摘This research developed a hybrid position-channel network (named PCNet) through incorporating newly designed channel and position attention modules into U-Net to alleviate the crack discontinuity problem in channel and spatial dimensions. In PCNet, the U-Net is used as a baseline to extract informative spatial and channel-wise features from shield tunnel lining crack images. A channel and a position attention module are designed and embedded after each convolution layer of U-Net to model the feature interdependencies in channel and spatial dimensions. These attention modules can make the U-Net adaptively integrate local crack features with their global dependencies. Experiments were conducted utilizing the dataset based on the images from Shanghai metro shield tunnels. The results validate the effectiveness of the designed channel and position attention modules, since they can individually increase balanced accuracy (BA) by 11.25% and 12.95%, intersection over union (IoU) by 10.79% and 11.83%, and F1 score by 9.96% and 10.63%, respectively. In comparison with the state-of-the-art models (i.e. LinkNet, PSPNet, U-Net, PANet, and Mask R–CNN) on the testing dataset, the proposed PCNet outperforms others with an improvement of BA, IoU, and F1 score owing to the implementation of the channel and position attention modules. These evaluation metrics indicate that the proposed PCNet presents refined crack segmentation with improved performance and is a practicable approach to segment shield tunnel lining cracks in field practice.
基金National Natural Science Foundation of China,Grant/Award Number:62071039Beijing Natural Science Foundation,Grant/Award Number:L223033。
文摘The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sampling rate,how to model longsequence data and make rational use of the relevant information between channels is also an urgent problem to be solved.In order to solve the above problems,the performance of the end-to-end music separation algorithm is enhanced by improving the network structure.Our main contributions include the following:(1)A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music,such as main melody,tone and so on.(2)On this basis,the multi-head attention and dualpath transformer are introduced in the separation module.Channel attention units are applied recursively on the feature map of each layer of the network,enabling the network to perform long-sequence separation.Experimental results show that after the introduction of the channel attention,the performance of the proposed algorithm has a stable improvement compared with the baseline system.On the MUSDB18 dataset,the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain(T-F domain).
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
文摘An end-to-end channel attention and pixel attention network(CP-Net)is proposed to produce dehazed image directly in the paper.The CP-Net structure contains three critical components.Firstly,the double attention(DA)module consisting of channel attention(CA)and pixel attention(PA).Different channel features contain different levels of important information,and CA can give more weight to relevant information,so the network can learn more useful information.Meanwhile,haze is unevenly distributed on different pixels,and PA is able to filter out haze with varying weights for different pixels.It sums the outputs of the two attention modules to improve further feature representation which contributes to better dehazing result.Secondly,local residual learning and DA module constitute another important component,namely basic block structure.Local residual learning can transfer the feature information in the shallow part of the network to the deep part of the network through multiple local residual connections and enhance the expressive ability of CP-Net.Thirdly,CP-Net mainly uses its core component,DA module,to automatically assign different weights to different features to achieve satisfactory dehazing effect.The experiment results on synthetic datasets and real hazy images indicate that many state-of-the-art single image dehazing methods have been surpassed by the CP-Net both quantitatively and qualitatively.
基金funded by Ministry of Science and Technology of the People’s Republic of China,Grant Numbers 2022YFC3800502Chongqing Science and Technology Commission,Grant Number cstc2020jscx-dxwtBX0019,CSTB2022TIAD-KPX0118,cstc2020jscx-cylhX0005 and cstc2021jscx-gksbX0058.
文摘Vehicle detection plays a crucial role in the field of autonomous driving technology.However,directly applying deep learning-based object detection algorithms to complex road scene images often leads to subpar performance and slow inference speeds in vehicle detection.Achieving a balance between accuracy and detection speed is crucial for real-time object detection in real-world road scenes.This paper proposes a high-precision and fast vehicle detector called the feature-guided bidirectional pyramid network(FBPN).Firstly,to tackle challenges like vehicle occlusion and significant background interference,the efficient feature filtering module(EFFM)is introduced into the deep network,which amplifies the disparities between the features of the vehicle and the background.Secondly,the proposed global attention localization module(GALM)in the model neck effectively perceives the detailed position information of the target,improving both the accuracy and inference speed of themodel.Finally,the detection accuracy of small-scale vehicles is further enhanced through the utilization of a four-layer feature pyramid structure.Experimental results show that FBPN achieves an average precision of 60.8% and 97.8% on the BDD100K and KITTI datasets,respectively,with inference speeds reaching 344.83 frames/s and 357.14 frames/s.FBPN demonstrates its effectiveness and superiority by striking a balance between detection accuracy and inference speed,outperforming several state-of-the-art methods.
基金supported by the National Natural Science Foundation of China(61877067)the Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2019A002,TCGZ2021C003,6142414200511)the Natural Science Basic Research Program of Shaanxi(2021JZ-19)。
文摘Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
基金sponsored by the Autonomous Region Key R&D Task Special(2022B01008)the National Key R&D Program of China(SQ2022AAA010308-5).
文摘Network intrusion detection systems(NIDS)based on deep learning have continued to make significant advances.However,the following challenges remain:on the one hand,simply applying only Temporal Convolutional Networks(TCNs)can lead to models that ignore the impact of network traffic features at different scales on the detection performance.On the other hand,some intrusion detection methods considermulti-scale information of traffic data,but considering only forward network traffic information can lead to deficiencies in capturing multi-scale temporal features.To address both of these issues,we propose a hybrid Convolutional Neural Network that supports a multi-output strategy(BONUS)for industrial internet intrusion detection.First,we create a multiscale Temporal Convolutional Network by stacking TCN of different scales to capture the multiscale information of network traffic.Meanwhile,we propose a bi-directional structure and dynamically set the weights to fuse the forward and backward contextual information of network traffic at each scale to enhance the model’s performance in capturing the multi-scale temporal features of network traffic.In addition,we introduce a gated network for each of the two branches in the proposed method to assist the model in learning the feature representation of each branch.Extensive experiments reveal the effectiveness of the proposed approach on two publicly available traffic intrusion detection datasets named UNSW-NB15 and NSL-KDD with F1 score of 85.03% and 99.31%,respectively,which also validates the effectiveness of enhancing the model’s ability to capture multi-scale temporal features of traffic data on detection performance.
基金supported by the National Natural Science Foundation of China under Grant 62172059,61972057 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+1 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004Postgraduate Scientific Research Innovation Project of Hunan Province under Grant CX20210811.
文摘Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious operations,such as destroying evidence. Therefore, detection and localization of imageinpainting operations are essential. Recent research shows that high-pass filteringfull convolutional network (HPFCN) is applied to image inpainting detection andachieves good results. However, those methods did not consider the spatial location and channel information of the feature map. To solve these shortcomings, weintroduce the squeezed excitation blocks (SE) and propose a high-pass filter attention full convolutional network (HPACN). In feature extraction, we apply concurrent spatial and channel attention (scSE) to enhance feature extraction and obtainmore information. Channel attention (cSE) is introduced in upsampling toenhance detection and localization. The experimental results show that the proposed method can achieve improvement on ImageNet.
基金This work was supported in part by the Natural Science Foundation of China under Grant 62063004 and 61762033in part by the Hainan Provincial Natural Science Foundation of China under Grant 2019RC018 and 619QN246by the Postdoctoral Science Foundation under Grant 2020TQ0293.
文摘Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only channel or spatial information,and cannot make full use of both channel and spatial information to improve SISR performance further.The present work addresses this problem by proposing a mixed attention densely residual network architecture that can make full and simultaneous use of both channel and spatial information.Specifically,we propose a residual in dense network structure composed of dense connections between multiple dense residual groups to form a very deep network.This structure allows each dense residual group to apply a local residual skip connection and enables the cascading of multiple residual blocks to reuse previous features.A mixed attention module is inserted into each dense residual group,to enable the algorithm to fuse channel attention with laplacian spatial attention effectively,and thereby more adaptively focus on valuable feature learning.The qualitative and quantitative results of extensive experiments have demonstrate that the proposed method has a comparable performance with other stateof-the-art methods.
基金the National Natural Science Foundation of China(No.62276210,82201148,61775180)the Natural Science Basic Research Program of Shaanxi Province(No.2022JM-380)+3 种基金the Shaanxi Province College Students'Innovation and Entrepreneurship Training Program(No.S202311664128X)the Natural Science Foundation of Zhejiang Province(No.LQ22H120002)the Medical Health Science and Technology Project of Zhejiang Province(No.2022RC069,2023KY1140)the Natural Science Foundation of Ningbo(No.2023J390)。
文摘Cataract is the leading cause of visual impairment globally.The scarcity and uneven distribution of ophthalmologists seriously hinder early visual impairment grading for cataract patients in the clin-ic.In this study,a deep learning-based automated grading system of visual impairment in cataract patients is proposed using a multi-scale efficient channel attention convolutional neural network(MECA_CNN).First,the efficient channel attention mechanism is applied in the MECA_CNN to extract multi-scale features of fundus images,which can effectively focus on lesion-related regions.Then,the asymmetric convolutional modules are embedded in the residual unit to reduce the infor-mation loss of fine-grained features in fundus images.In addition,the asymmetric loss function is applied to address the problem of a higher false-negative rate and weak generalization ability caused by the imbalanced dataset.A total of 7299 fundus images derived from two clinical centers are em-ployed to develop and evaluate the MECA_CNN for identifying mild visual impairment caused by cataract(MVICC),moderate to severe visual impairment caused by cataract(MSVICC),and nor-mal sample.The experimental results demonstrate that the MECA_CNN provides clinically meaning-ful performance for visual impairment grading in the internal test dataset:MVICC(accuracy,sensi-tivity,and specificity;91.3%,89.9%,and 92%),MSVICC(93.2%,78.5%,and 96.7%),and normal sample(98.1%,98.0%,and 98.1%).The comparable performance in the external test dataset is achieved,further verifying the effectiveness and generalizability of the MECA_CNN model.This study provides a deep learning-based practical system for the automated grading of visu-al impairment in cataract patients,facilitating the formulation of treatment strategies in a timely man-ner and improving patients’vision prognosis.
基金supported by the National Natural Science Foundation of China(Grant No.31671571)the Shanxi Province Basic Research Program Project(Free Exploration)(No.20210302124523,20210302123408,202103021224149,and 202103021223141)the Youth Agricultural Science and Technology Innovation Fund of Shanxi Agricultural University(Grant No.2019027)。
文摘The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image segmentation.However,two factors limit the development of this field.On the one hand,the individual pigs are easy to stick together,and the occlusion of debris such as pigpens can easily make the model misjudgment.On the other hand,manual labeling of group-raised pig data is time-consuming and labor-intensive and is prone to labeling errors.Therefore,it is urgent for an individual pig image segmentation model that can perform well in individual scenarios and can be easily migrated to a group-raised environment.In order to solve the above problems,taking individual pigs as research objects,an individual pig image segmentation dataset containing 2066 images was constructed,and a series of algorithms based on fully convolutional networks were proposed to solve the pig image segmentation problem.In order to capture the long-range dependencies and weaken the background information such as pigpens while enhancing the information of individual parts of pigs,the channel and spatial attention blocks were introduced into the best-performing decoders UNet and LinkNet.Experiments show that using ResNext50 as the encoder and Unet as the decoder as the basic model,adding two attention blocks at the same time achieves 98.30%and 96.71%on the F1 and IOU metrics,respectively.Compared with the model adding channel attention block alone,the two metrics are improved by 0.13%and 0.22%,respectively.The experiment of introducing channel and spatial attention alone shows that spatial attention is more effective than channel attention.Taking VGG16-LinkNet as an example,compared with channel attention,spatial attention improves the F1 and IOU metrics by 0.16%and 0.30%,respectively.Furthermore,the heatmap of the feature of different layers of the decoder after adding different attention information proves that with the increase of layers,the boundary of pig image segmentation is clearer.In order to verify the effectiveness of the individual pig image segmentation model in group-raised scenes,the transfer performance of the model is verified in three scenarios of high separation,deep adhesion,and pigpen occlusion.The experiments show that the segmentation results of adding attention information,especially the simultaneous fusion of channel and spatial attention blocks,are more refined and complete.The attention-based individual pig image segmentation model can be effectively transferred to the field of group-raised pigs and can provide a reference for its pre-segmentation.
基金supported in part by the National Natural Science Foundation of China(Grant Number 61971078)Chongqing University of Technology Graduate Innovation Foundation(Grant Number gzlcx20222064).
文摘With the improvement of image editing technology,the threshold of image tampering technology decreases,which leads to a decrease in the authenticity of image content.This has also driven research on image forgery detection techniques.In this paper,a U-Net with multiple sensory field feature extraction(MSCU-Net)for image forgery detection is proposed.The proposed MSCU-Net is an end-to-end image essential attribute segmentation network that can perform image forgery detection without any pre-processing or post-processing.MSCU-Net replaces the single-scale convolution module in the original network with an improved multiple perceptual field convolution module so that the decoder can synthesize the features of different perceptual fields use residual propagation and residual feedback to recall the input feature information and consolidate the input feature information to make the difference in image attributes between the untampered and tampered regions more obvious,and introduce the channel coordinate confusion attention mechanism(CCCA)in skip-connection to further improve the segmentation accuracy of the network.In this paper,extensive experiments are conducted on various mainstream datasets,and the results verify the effectiveness of the proposed method,which outperforms the state-of-the-art image forgery detection methods.