The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivot...The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivotal components of image preprocessing,fostering an improvement in the quality of remote sensing imagery.This enhancement renders remote sensing data more indispensable,thereby enhancing the accuracy of target iden-tification.Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images.In response to this challenge,a novel UNet Residual Attention Network(URA-Net)is proposed.This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections.The essence of our methodology lies in local feature fusion within dense residual clusters,enabling the extraction of pertinent features from both preceding and current local data,depending on contextual demands.The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder,resulting in superior outcomes in haze removal.Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net,demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging.On the RICE-1 dataset,URA-Net achieves a Peak Signal-to-Noise Ratio(PSNR)of 29.07 dB,surpassing the Dark Channel Prior(DCP)by 11.17 dB,the All-in-One Network for Dehazing(AOD)by 7.82 dB,the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing(OTM-AAL)by 5.37 dB,the Unsupervised Single Image Dehazing(USID)by 8.0 dB,and the Superpixel-based Remote Sensing Image Dehazing(SRD)by 8.5 dB.Particularly noteworthy,on the SateHaze1k dataset,URA-Net attains preeminence in overall performance,yielding defogged images characterized by consistent visual quality.This underscores the contribution of the research to the advancement of remote sensing technology,providing a robust and efficient solution for alleviating the adverse effects of haze on image quality.展开更多
Detecting brain tumours is complex due to the natural variation in their location, shape, and intensity in images. While having accurate detection and segmentation of brain tumours would be beneficial, current methods...Detecting brain tumours is complex due to the natural variation in their location, shape, and intensity in images. While having accurate detection and segmentation of brain tumours would be beneficial, current methods still need to solve this problem despite the numerous available approaches. Precise analysis of Magnetic Resonance Imaging (MRI) is crucial for detecting, segmenting, and classifying brain tumours in medical diagnostics. Magnetic Resonance Imaging is a vital component in medical diagnosis, and it requires precise, efficient, careful, efficient, and reliable image analysis techniques. The authors developed a Deep Learning (DL) fusion model to classify brain tumours reliably. Deep Learning models require large amounts of training data to achieve good results, so the researchers utilised data augmentation techniques to increase the dataset size for training models. VGG16, ResNet50, and convolutional deep belief networks networks extracted deep features from MRI images. Softmax was used as the classifier, and the training set was supplemented with intentionally created MRI images of brain tumours in addition to the genuine ones. The features of two DL models were combined in the proposed model to generate a fusion model, which significantly increased classification accuracy. An openly accessible dataset from the internet was used to test the model's performance, and the experimental results showed that the proposed fusion model achieved a classification accuracy of 98.98%. Finally, the results were compared with existing methods, and the proposed model outperformed them significantly.展开更多
Facing the very high-resolution( VHR) image classification problem,a feature extraction and fusion framework is presented for VHR panchromatic and multispectral image classification based on deep learning techniques. ...Facing the very high-resolution( VHR) image classification problem,a feature extraction and fusion framework is presented for VHR panchromatic and multispectral image classification based on deep learning techniques. The proposed approach combines spectral and spatial information based on the fusion of features extracted from panchromatic( PAN) and multispectral( MS) images using sparse autoencoder and its deep version. There are three steps in the proposed method,the first one is to extract spatial information of PAN image,and the second one is to describe spectral information of MS image. Finally,in the third step,the features obtained from PAN and MS images are concatenated directly as a simple fusion feature. The classification is performed using the support vector machine( SVM) and the experiments carried out on two datasets with very high spatial resolution. MS and PAN images from WorldView-2 satellite indicate that the classifier provides an efficient solution and demonstrate that the fusion of the features extracted by deep learning techniques from PAN and MS images performs better than that when these techniques are used separately. In addition,this framework shows that deep learning models can extract and fuse spatial and spectral information greatly,and have huge potential to achieve higher accuracy for classification of multispectral and panchromatic images.展开更多
Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model an...Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model and effect of ENVISAT/SAR and HJ-1A satel ite multispectral remote sensing images. Based on the ARSIS strat-egy, using the wavelet transform and the Interaction between the Band Structure Model (IBSM), the research progressed the ENVISAT satel ite SAR and the HJ-1A satel ite CCD images wavelet decomposition, and low/high frequency coefficient re-construction, and obtained the fusion images through the inverse wavelet transform. In the light of low and high-frequency images have different characteristics in differ-ent areas, different fusion rules which can enhance the integration process of self-adaptive were taken, with comparisons with the PCA transformation, IHS transfor-mation and other traditional methods by subjective and the corresponding quantita-tive evaluation. Furthermore, the research extracted the bands and NDVI values around the fusion with GPS samples, analyzed and explained the fusion effect. The results showed that the spectral distortion of wavelet fusion, IHS transform, PCA transform images was 0.101 6, 0.326 1 and 1.277 2, respectively and entropy was 14.701 5, 11.899 3 and 13.229 3, respectively, the wavelet fusion is the highest. The method of wavelet maintained good spectral capability, and visual effects while improved the spatial resolution, the information interpretation effect was much better than other two methods.展开更多
Gliomas have the highest mortality rate of all brain tumors.Correctly classifying the glioma risk period can help doctors make reasonable treatment plans and improve patients’survival rates.This paper proposes a hier...Gliomas have the highest mortality rate of all brain tumors.Correctly classifying the glioma risk period can help doctors make reasonable treatment plans and improve patients’survival rates.This paper proposes a hierarchical multi-scale attention feature fusion medical image classification network(HMAC-Net),which effectively combines global features and local features.The network framework consists of three parallel layers:The global feature extraction layer,the local feature extraction layer,and the multi-scale feature fusion layer.A linear sparse attention mechanism is designed in the global feature extraction layer to reduce information redundancy.In the local feature extraction layer,a bilateral local attention mechanism is introduced to improve the extraction of relevant information between adjacent slices.In the multi-scale feature fusion layer,a channel fusion block combining convolutional attention mechanism and residual inverse multi-layer perceptron is proposed to prevent gradient disappearance and network degradation and improve feature representation capability.The double-branch iterative multi-scale classification block is used to improve the classification performance.On the brain glioma risk grading dataset,the results of the ablation experiment and comparison experiment show that the proposed HMAC-Net has the best performance in both qualitative analysis of heat maps and quantitative analysis of evaluation indicators.On the dataset of skin cancer classification,the generalization experiment results show that the proposed HMAC-Net has a good generalization effect.展开更多
Automatic segmentation of medical images provides a reliable scientific basis for disease diagnosis and analysis.Notably,most existing methods that combine the strengths of convolutional neural networks(CNNs)and Trans...Automatic segmentation of medical images provides a reliable scientific basis for disease diagnosis and analysis.Notably,most existing methods that combine the strengths of convolutional neural networks(CNNs)and Transformers have made significant progress.However,there are some limitations in the current integration of CNN and Transformer technology in two key aspects.Firstly,most methods either overlook or fail to fully incorporate the complementary nature between local and global features.Secondly,the significance of integrating the multiscale encoder features from the dual-branch network to enhance the decoding features is often disregarded in methods that combine CNN and Transformer.To address this issue,we present a groundbreaking dual-branch cross-attention fusion network(DCFNet),which efficiently combines the power of Swin Transformer and CNN to generate complementary global and local features.We then designed the Feature Cross-Fusion(FCF)module to efficiently fuse local and global features.In the FCF,the utilization of the Channel-wise Cross-fusion Transformer(CCT)serves the purpose of aggregatingmulti-scale features,and the Feature FusionModule(FFM)is employed to effectively aggregate dual-branch prominent feature regions from the spatial perspective.Furthermore,within the decoding phase of the dual-branch network,our proposed Channel Attention Block(CAB)aims to emphasize the significance of the channel features between the up-sampled features and the features generated by the FCFmodule to enhance the details of the decoding.Experimental results demonstrate that DCFNet exhibits enhanced accuracy in segmentation performance.Compared to other state-of-the-art(SOTA)methods,our segmentation framework exhibits a superior level of competitiveness.DCFNet’s accurate segmentation of medical images can greatly assist medical professionals in making crucial diagnoses of lesion areas in advance.展开更多
Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary mea...Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propo...Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propose a two-stage unsupervised low-light image enhancement algorithm called Retinex and Exposure Fusion Network(RFNet),which can overcome the problems of over-enhancement of the high dynamic range and under-enhancement of the low dynamic range in existing enhancement algorithms.This algorithm can better manage the challenges brought about by complex environments in real-world scenarios by training with unpaired low-light images and regular-light images.In the first stage,we design a multi-scale feature extraction module based on Retinex theory,capable of extracting details and structural information at different scales to generate high-quality illumination and reflection images.In the second stage,an exposure image generator is designed through the camera response mechanism function to acquire exposure images containing more dark features,and the generated images are fused with the original input images to complete the low-light image enhancement.Experiments show the effectiveness and rationality of each module designed in this paper.And the method reconstructs the details of contrast and color distribution,outperforms the current state-of-the-art methods in both qualitative and quantitative metrics,and shows excellent performance in the real world.展开更多
Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this...Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.展开更多
To improve the quality of the infrared image and enhance the information of the object,a dual band infrared image fusion method based on feature extraction and a novel multiple pulse coupled neural network(multi-PCNN)...To improve the quality of the infrared image and enhance the information of the object,a dual band infrared image fusion method based on feature extraction and a novel multiple pulse coupled neural network(multi-PCNN)is proposed.In this multi-PCNN fusion scheme,the auxiliary PCNN which captures the characteristics of feature image extracting from the infrared image is used to modulate the main PCNN,whose input could be original infrared image.Meanwhile,to make the PCNN fusion effect consistent with the human vision system,Laplacian energy is adopted to obtain the value of adaptive linking strength in PCNN.After that,the original dual band infrared images are reconstructed by using a weight fusion rule with the fire mapping images generated by the main PCNNs to obtain the fused image.Compared to wavelet transforms,Laplacian pyramids and traditional multi-PCNNs,fusion images based on our method have more information,rich details and clear edges.展开更多
IHS (Intensity, Hue and Saturation) transform is one of the most commonly used tusion algonthm. But the matching error causes spectral distortion and degradation in processing of image fusion with IHS method. A stud...IHS (Intensity, Hue and Saturation) transform is one of the most commonly used tusion algonthm. But the matching error causes spectral distortion and degradation in processing of image fusion with IHS method. A study on IHS fusion indicates that the color distortion can't be avoided. Meanwhile, the statistical property of wavelet coefficient with wavelet decomposition reflects those significant features, such as edges, lines and regions. So, a united optimal fusion method, which uses the statistical property and IHS transform on pixel and feature levels, is proposed. That is, the high frequency of intensity component Ⅰ is fused on feature level with multi-resolution wavelet in IHS space. And the low frequency of intensity component Ⅰ is fused on pixel level with optimal weight coefficients. Spectral information and spatial resolution are two performance indexes of optimal weight coefficients. Experiment results with QuickBird data of Shanghai show that it is a practical and effective method.展开更多
In order to improve the accuracy and stability of fruit and vegetable image recognition by single feature, this project proposed multi-feature fusion algorithms and SVM classification algorithms. This project not only...In order to improve the accuracy and stability of fruit and vegetable image recognition by single feature, this project proposed multi-feature fusion algorithms and SVM classification algorithms. This project not only introduces the Reproducing Kernel Hilbert space to improve the multi-feature compatibility and improve multi-feature fusion algorithm, but also introduces TPS transformation model in SVM classifier to improve the classification accuracy, real-time and robustness of integration feature. By using multi-feature fusion algorithms and SVM classification algorithms, experimental results show that we can recognize the common fruit and vegetable images efficiently and accurately.展开更多
Medical image fusion is considered the best method for obtaining one image with rich details for efficient medical diagnosis and therapy.Deep learning provides a high performance for several medical image analysis app...Medical image fusion is considered the best method for obtaining one image with rich details for efficient medical diagnosis and therapy.Deep learning provides a high performance for several medical image analysis applications.This paper proposes a deep learning model for the medical image fusion process.This model depends on Convolutional Neural Network(CNN).The basic idea of the proposed model is to extract features from both CT and MR images.Then,an additional process is executed on the extracted features.After that,the fused feature map is reconstructed to obtain the resulting fused image.Finally,the quality of the resulting fused image is enhanced by various enhancement techniques such as Histogram Matching(HM),Histogram Equalization(HE),fuzzy technique,fuzzy type,and Contrast Limited Histogram Equalization(CLAHE).The performance of the proposed fusion-based CNN model is measured by various metrics of the fusion and enhancement quality.Different realistic datasets of different modalities and diseases are tested and implemented.Also,real datasets are tested in the simulation analysis.展开更多
Semantic segmentation of remote sensing images is one of the core tasks of remote sensing image interpretation.With the continuous develop-ment of artificial intelligence technology,the use of deep learning methods fo...Semantic segmentation of remote sensing images is one of the core tasks of remote sensing image interpretation.With the continuous develop-ment of artificial intelligence technology,the use of deep learning methods for interpreting remote-sensing images has matured.Existing neural networks disregard the spatial relationship between two targets in remote sensing images.Semantic segmentation models that combine convolutional neural networks(CNNs)and graph convolutional neural networks(GCNs)cause a lack of feature boundaries,which leads to the unsatisfactory segmentation of various target feature boundaries.In this paper,we propose a new semantic segmentation model for remote sensing images(called DGCN hereinafter),which combines deep semantic segmentation networks(DSSN)and GCNs.In the GCN module,a loss function for boundary information is employed to optimize the learning of spatial relationship features between the target features and their relationships.A hierarchical fusion method is utilized for feature fusion and classification to optimize the spatial relationship informa-tion in the original feature information.Extensive experiments on ISPRS 2D and DeepGlobe semantic segmentation datasets show that compared with the existing semantic segmentation models of remote sensing images,the DGCN significantly optimizes the segmentation effect of feature boundaries,effectively reduces the noise in the segmentation results and improves the segmentation accuracy,which demonstrates the advancements of our model.展开更多
A novel feature fusion method is proposed for the edge detection of color images. Except for the typical features used in edge detection, the color contrast similarity and the orientation consistency are also selected...A novel feature fusion method is proposed for the edge detection of color images. Except for the typical features used in edge detection, the color contrast similarity and the orientation consistency are also selected as the features. The four features are combined together as a parameter to detect the edges of color images. Experimental results show that the method can inhibit noisy edges and facilitate the detection for weak edges. It has a better performance than conventional methods in noisy environments.展开更多
Image captioning involves two different major modalities(image and sentence)that convert a given image into a language that adheres to visual semantics.Almost all methods first extract image features to reduce the dif...Image captioning involves two different major modalities(image and sentence)that convert a given image into a language that adheres to visual semantics.Almost all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent sentences.The Convolutional Neural Network(CNN)is often used to extract image features in image captioning,and the use of object detection networks to extract region features has achieved great success.However,the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection model’s limitation.We offer an approach to address this issue that more properly generates captions by fusing fine-grained features and region features.First,we extract fine-grained features using a panoramic segmentation algorithm.Second,we suggest two fusion methods and contrast their fusion outcomes.An X-linear Attention Network(X-LAN)serves as the foundation for both fusion methods.According to experimental findings on the COCO dataset,the two-branch fusion approach is superior.It is important to note that on the COCO Karpathy test split,CIDEr is increased up to 134.3%in comparison to the baseline,highlighting the potency and viability of our method.展开更多
Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation ...Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation algorithms have problems with mis-segmentation and poor edge segmentation.To address these challenges,we propose a medical image segmentation network(AF-Net)based on attention mechanism and feature fusion,which can effectively capture global information while focusing the network on the object area.In this approach,we add dual attention blocks(DA-block)to the backbone network,which comprises parallel channels and spatial attention branches,to adaptively calibrate and weigh features.Secondly,the multi-scale feature fusion block(MFF-block)is proposed to obtain feature maps of different receptive domains and get multi-scale information with less computational consumption.Finally,to restore the locations and shapes of organs,we adopt the global feature fusion blocks(GFF-block)to fuse high-level and low-level information,which can obtain accurate pixel positioning.We evaluate our method on multiple datasets(the aorta and lungs dataset),and the experimental results achieve 94.0%in mIoU and 96.3%in DICE,showing that our approach performs better than U-Net and other state-of-art methods.展开更多
Spectral decomposition has been widely used in the detection and identifi cation of underground anomalous features(such as faults,river channels,and karst caves).However,the conventional spectral decomposition method ...Spectral decomposition has been widely used in the detection and identifi cation of underground anomalous features(such as faults,river channels,and karst caves).However,the conventional spectral decomposition method is restrained by the window function,and hence,it mostly has low time–frequency focusing and resolution,thereby hampering the fi ne interpretation of seismic targets.To solve this problem,we investigated the sparse inverse spectral decomposition constrained by the lp norm(0<p≤1).Using a numerical model,we demonstrated the higher time–frequency resolution of this method and its capability for improving the seismic interpretation for thin layers.Moreover,given the actual underground geology that can be often complex,we further propose a p-norm constrained inverse spectral attribute interpretation method based on multiresolution time–frequency feature fusion.By comprehensively analyzing the time–frequency spectrum results constrained by the diff erent p-norms,we can obtain more refined interpretation results than those obtained by the traditional strategy,which incorporates a single norm constraint.Finally,the proposed strategy was applied to the processing and interpretation of actual three-dimensional seismic data for a study area covering about 230 km^(2) in western China.The results reveal that the surface water system in this area is characterized by stepwise convergence from a higher position in the north(a buried hill)toward the south and by the development of faults.We thus demonstrated that the proposed method has huge application potential in seismic interpretation.展开更多
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula...The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.展开更多
基金This project is supported by the National Natural Science Foundation of China(NSFC)(No.61902158).
文摘The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle,profoundly impeding their effective utilization across various domains.Dehazing methodologies have emerged as pivotal components of image preprocessing,fostering an improvement in the quality of remote sensing imagery.This enhancement renders remote sensing data more indispensable,thereby enhancing the accuracy of target iden-tification.Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images.In response to this challenge,a novel UNet Residual Attention Network(URA-Net)is proposed.This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections.The essence of our methodology lies in local feature fusion within dense residual clusters,enabling the extraction of pertinent features from both preceding and current local data,depending on contextual demands.The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder,resulting in superior outcomes in haze removal.Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net,demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging.On the RICE-1 dataset,URA-Net achieves a Peak Signal-to-Noise Ratio(PSNR)of 29.07 dB,surpassing the Dark Channel Prior(DCP)by 11.17 dB,the All-in-One Network for Dehazing(AOD)by 7.82 dB,the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing(OTM-AAL)by 5.37 dB,the Unsupervised Single Image Dehazing(USID)by 8.0 dB,and the Superpixel-based Remote Sensing Image Dehazing(SRD)by 8.5 dB.Particularly noteworthy,on the SateHaze1k dataset,URA-Net attains preeminence in overall performance,yielding defogged images characterized by consistent visual quality.This underscores the contribution of the research to the advancement of remote sensing technology,providing a robust and efficient solution for alleviating the adverse effects of haze on image quality.
基金Ministry of Education,Youth and Sports of the Chezk Republic,Grant/Award Numbers:SP2023/039,SP2023/042the European Union under the REFRESH,Grant/Award Number:CZ.10.03.01/00/22_003/0000048。
文摘Detecting brain tumours is complex due to the natural variation in their location, shape, and intensity in images. While having accurate detection and segmentation of brain tumours would be beneficial, current methods still need to solve this problem despite the numerous available approaches. Precise analysis of Magnetic Resonance Imaging (MRI) is crucial for detecting, segmenting, and classifying brain tumours in medical diagnostics. Magnetic Resonance Imaging is a vital component in medical diagnosis, and it requires precise, efficient, careful, efficient, and reliable image analysis techniques. The authors developed a Deep Learning (DL) fusion model to classify brain tumours reliably. Deep Learning models require large amounts of training data to achieve good results, so the researchers utilised data augmentation techniques to increase the dataset size for training models. VGG16, ResNet50, and convolutional deep belief networks networks extracted deep features from MRI images. Softmax was used as the classifier, and the training set was supplemented with intentionally created MRI images of brain tumours in addition to the genuine ones. The features of two DL models were combined in the proposed model to generate a fusion model, which significantly increased classification accuracy. An openly accessible dataset from the internet was used to test the model's performance, and the experimental results showed that the proposed fusion model achieved a classification accuracy of 98.98%. Finally, the results were compared with existing methods, and the proposed model outperformed them significantly.
基金Supported by the National Natural Science Foundation of China(No.61472103,61772158,U.1711265)
文摘Facing the very high-resolution( VHR) image classification problem,a feature extraction and fusion framework is presented for VHR panchromatic and multispectral image classification based on deep learning techniques. The proposed approach combines spectral and spatial information based on the fusion of features extracted from panchromatic( PAN) and multispectral( MS) images using sparse autoencoder and its deep version. There are three steps in the proposed method,the first one is to extract spatial information of PAN image,and the second one is to describe spectral information of MS image. Finally,in the third step,the features obtained from PAN and MS images are concatenated directly as a simple fusion feature. The classification is performed using the support vector machine( SVM) and the experiments carried out on two datasets with very high spatial resolution. MS and PAN images from WorldView-2 satellite indicate that the classifier provides an efficient solution and demonstrate that the fusion of the features extracted by deep learning techniques from PAN and MS images performs better than that when these techniques are used separately. In addition,this framework shows that deep learning models can extract and fuse spatial and spectral information greatly,and have huge potential to achieve higher accuracy for classification of multispectral and panchromatic images.
基金supported by the National Natural Science Foundation of China(41171336)the Project of Jiangsu Province Agricultural Science and Technology Innovation Fund(CX12-3054)
文摘Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model and effect of ENVISAT/SAR and HJ-1A satel ite multispectral remote sensing images. Based on the ARSIS strat-egy, using the wavelet transform and the Interaction between the Band Structure Model (IBSM), the research progressed the ENVISAT satel ite SAR and the HJ-1A satel ite CCD images wavelet decomposition, and low/high frequency coefficient re-construction, and obtained the fusion images through the inverse wavelet transform. In the light of low and high-frequency images have different characteristics in differ-ent areas, different fusion rules which can enhance the integration process of self-adaptive were taken, with comparisons with the PCA transformation, IHS transfor-mation and other traditional methods by subjective and the corresponding quantita-tive evaluation. Furthermore, the research extracted the bands and NDVI values around the fusion with GPS samples, analyzed and explained the fusion effect. The results showed that the spectral distortion of wavelet fusion, IHS transform, PCA transform images was 0.101 6, 0.326 1 and 1.277 2, respectively and entropy was 14.701 5, 11.899 3 and 13.229 3, respectively, the wavelet fusion is the highest. The method of wavelet maintained good spectral capability, and visual effects while improved the spatial resolution, the information interpretation effect was much better than other two methods.
基金Major Program of National Natural Science Foundation of China(NSFC12292980,NSFC12292984)National Key R&D Program of China(2023YFA1009000,2023YFA1009004,2020YFA0712203,2020YFA0712201)+2 种基金Major Program of National Natural Science Foundation of China(NSFC12031016)Beijing Natural Science Foundation(BNSFZ210003)Department of Science,Technology and Information of the Ministry of Education(8091B042240).
文摘Gliomas have the highest mortality rate of all brain tumors.Correctly classifying the glioma risk period can help doctors make reasonable treatment plans and improve patients’survival rates.This paper proposes a hierarchical multi-scale attention feature fusion medical image classification network(HMAC-Net),which effectively combines global features and local features.The network framework consists of three parallel layers:The global feature extraction layer,the local feature extraction layer,and the multi-scale feature fusion layer.A linear sparse attention mechanism is designed in the global feature extraction layer to reduce information redundancy.In the local feature extraction layer,a bilateral local attention mechanism is introduced to improve the extraction of relevant information between adjacent slices.In the multi-scale feature fusion layer,a channel fusion block combining convolutional attention mechanism and residual inverse multi-layer perceptron is proposed to prevent gradient disappearance and network degradation and improve feature representation capability.The double-branch iterative multi-scale classification block is used to improve the classification performance.On the brain glioma risk grading dataset,the results of the ablation experiment and comparison experiment show that the proposed HMAC-Net has the best performance in both qualitative analysis of heat maps and quantitative analysis of evaluation indicators.On the dataset of skin cancer classification,the generalization experiment results show that the proposed HMAC-Net has a good generalization effect.
基金supported by the National Key R&D Program of China(2018AAA0102100)the National Natural Science Foundation of China(No.62376287)+3 种基金the International Science and Technology Innovation Joint Base of Machine Vision and Medical Image Processing in Hunan Province(2021CB1013)the Key Research and Development Program of Hunan Province(2022SK2054)the Natural Science Foundation of Hunan Province(No.2022JJ30762,2023JJ70016)the 111 Project under Grant(No.B18059).
文摘Automatic segmentation of medical images provides a reliable scientific basis for disease diagnosis and analysis.Notably,most existing methods that combine the strengths of convolutional neural networks(CNNs)and Transformers have made significant progress.However,there are some limitations in the current integration of CNN and Transformer technology in two key aspects.Firstly,most methods either overlook or fail to fully incorporate the complementary nature between local and global features.Secondly,the significance of integrating the multiscale encoder features from the dual-branch network to enhance the decoding features is often disregarded in methods that combine CNN and Transformer.To address this issue,we present a groundbreaking dual-branch cross-attention fusion network(DCFNet),which efficiently combines the power of Swin Transformer and CNN to generate complementary global and local features.We then designed the Feature Cross-Fusion(FCF)module to efficiently fuse local and global features.In the FCF,the utilization of the Channel-wise Cross-fusion Transformer(CCT)serves the purpose of aggregatingmulti-scale features,and the Feature FusionModule(FFM)is employed to effectively aggregate dual-branch prominent feature regions from the spatial perspective.Furthermore,within the decoding phase of the dual-branch network,our proposed Channel Attention Block(CAB)aims to emphasize the significance of the channel features between the up-sampled features and the features generated by the FCFmodule to enhance the details of the decoding.Experimental results demonstrate that DCFNet exhibits enhanced accuracy in segmentation performance.Compared to other state-of-the-art(SOTA)methods,our segmentation framework exhibits a superior level of competitiveness.DCFNet’s accurate segmentation of medical images can greatly assist medical professionals in making crucial diagnoses of lesion areas in advance.
基金Supported by Shandong Province Key R and D Program,No.2021SFGC0504Shandong Provincial Natural Science Foundation,No.ZR2021MF079Science and Technology Development Plan of Jinan(Clinical Medicine Science and Technology Innovation Plan),No.202225054.
文摘Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported by the National Key Research and Development Program Topics(Grant No.2021YFB4000905)the National Natural Science Foundation of China(Grant Nos.62101432 and 62102309)in part by Shaanxi Natural Science Fundamental Research Program Project(No.2022JM-508).
文摘Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propose a two-stage unsupervised low-light image enhancement algorithm called Retinex and Exposure Fusion Network(RFNet),which can overcome the problems of over-enhancement of the high dynamic range and under-enhancement of the low dynamic range in existing enhancement algorithms.This algorithm can better manage the challenges brought about by complex environments in real-world scenarios by training with unpaired low-light images and regular-light images.In the first stage,we design a multi-scale feature extraction module based on Retinex theory,capable of extracting details and structural information at different scales to generate high-quality illumination and reflection images.In the second stage,an exposure image generator is designed through the camera response mechanism function to acquire exposure images containing more dark features,and the generated images are fused with the original input images to complete the low-light image enhancement.Experiments show the effectiveness and rationality of each module designed in this paper.And the method reconstructs the details of contrast and color distribution,outperforms the current state-of-the-art methods in both qualitative and quantitative metrics,and shows excellent performance in the real world.
基金Supported by Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61321002)Projects of Major International(Regional)Jiont Research Program NSFC(61120106010)+1 种基金Beijing Education Committee Cooperation Building Foundation ProjectProgram for Changjiang Scholars and Innovative Research Team in University(IRT1208)
文摘Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.
基金Supported by the National Natural Science Foundation of China(60905012,60572058)
文摘To improve the quality of the infrared image and enhance the information of the object,a dual band infrared image fusion method based on feature extraction and a novel multiple pulse coupled neural network(multi-PCNN)is proposed.In this multi-PCNN fusion scheme,the auxiliary PCNN which captures the characteristics of feature image extracting from the infrared image is used to modulate the main PCNN,whose input could be original infrared image.Meanwhile,to make the PCNN fusion effect consistent with the human vision system,Laplacian energy is adopted to obtain the value of adaptive linking strength in PCNN.After that,the original dual band infrared images are reconstructed by using a weight fusion rule with the fire mapping images generated by the main PCNNs to obtain the fused image.Compared to wavelet transforms,Laplacian pyramids and traditional multi-PCNNs,fusion images based on our method have more information,rich details and clear edges.
基金Supported by the High Technology Research and Development Programme of China (2001AA135091) and the National Natural Science Foundation of China (60375008).
文摘IHS (Intensity, Hue and Saturation) transform is one of the most commonly used tusion algonthm. But the matching error causes spectral distortion and degradation in processing of image fusion with IHS method. A study on IHS fusion indicates that the color distortion can't be avoided. Meanwhile, the statistical property of wavelet coefficient with wavelet decomposition reflects those significant features, such as edges, lines and regions. So, a united optimal fusion method, which uses the statistical property and IHS transform on pixel and feature levels, is proposed. That is, the high frequency of intensity component Ⅰ is fused on feature level with multi-resolution wavelet in IHS space. And the low frequency of intensity component Ⅰ is fused on pixel level with optimal weight coefficients. Spectral information and spatial resolution are two performance indexes of optimal weight coefficients. Experiment results with QuickBird data of Shanghai show that it is a practical and effective method.
基金This paper has been supported by the National Natural Science Foundation of China (Grant No. 61371040).
文摘In order to improve the accuracy and stability of fruit and vegetable image recognition by single feature, this project proposed multi-feature fusion algorithms and SVM classification algorithms. This project not only introduces the Reproducing Kernel Hilbert space to improve the multi-feature compatibility and improve multi-feature fusion algorithm, but also introduces TPS transformation model in SVM classifier to improve the classification accuracy, real-time and robustness of integration feature. By using multi-feature fusion algorithms and SVM classification algorithms, experimental results show that we can recognize the common fruit and vegetable images efficiently and accurately.
文摘Medical image fusion is considered the best method for obtaining one image with rich details for efficient medical diagnosis and therapy.Deep learning provides a high performance for several medical image analysis applications.This paper proposes a deep learning model for the medical image fusion process.This model depends on Convolutional Neural Network(CNN).The basic idea of the proposed model is to extract features from both CT and MR images.Then,an additional process is executed on the extracted features.After that,the fused feature map is reconstructed to obtain the resulting fused image.Finally,the quality of the resulting fused image is enhanced by various enhancement techniques such as Histogram Matching(HM),Histogram Equalization(HE),fuzzy technique,fuzzy type,and Contrast Limited Histogram Equalization(CLAHE).The performance of the proposed fusion-based CNN model is measured by various metrics of the fusion and enhancement quality.Different realistic datasets of different modalities and diseases are tested and implemented.Also,real datasets are tested in the simulation analysis.
基金funded by the Major Scientific and Technological Innovation Project of Shandong Province,Grant No.2022CXGC010609.
文摘Semantic segmentation of remote sensing images is one of the core tasks of remote sensing image interpretation.With the continuous develop-ment of artificial intelligence technology,the use of deep learning methods for interpreting remote-sensing images has matured.Existing neural networks disregard the spatial relationship between two targets in remote sensing images.Semantic segmentation models that combine convolutional neural networks(CNNs)and graph convolutional neural networks(GCNs)cause a lack of feature boundaries,which leads to the unsatisfactory segmentation of various target feature boundaries.In this paper,we propose a new semantic segmentation model for remote sensing images(called DGCN hereinafter),which combines deep semantic segmentation networks(DSSN)and GCNs.In the GCN module,a loss function for boundary information is employed to optimize the learning of spatial relationship features between the target features and their relationships.A hierarchical fusion method is utilized for feature fusion and classification to optimize the spatial relationship informa-tion in the original feature information.Extensive experiments on ISPRS 2D and DeepGlobe semantic segmentation datasets show that compared with the existing semantic segmentation models of remote sensing images,the DGCN significantly optimizes the segmentation effect of feature boundaries,effectively reduces the noise in the segmentation results and improves the segmentation accuracy,which demonstrates the advancements of our model.
基金supported partly by the National Basic Research Program of China (2005CB724303)the National Natural Science Foundation of China (60671062) Shanghai Leading Academic Discipline Project (B112).
文摘A novel feature fusion method is proposed for the edge detection of color images. Except for the typical features used in edge detection, the color contrast similarity and the orientation consistency are also selected as the features. The four features are combined together as a parameter to detect the edges of color images. Experimental results show that the method can inhibit noisy edges and facilitate the detection for weak edges. It has a better performance than conventional methods in noisy environments.
基金supported in part by the National Natural Science Foundation of China(NSFC)under Grant 6150140in part by the Youth Innovation Project(21032158-Y)of Zhejiang Sci-Tech University.
文摘Image captioning involves two different major modalities(image and sentence)that convert a given image into a language that adheres to visual semantics.Almost all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent sentences.The Convolutional Neural Network(CNN)is often used to extract image features in image captioning,and the use of object detection networks to extract region features has achieved great success.However,the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection model’s limitation.We offer an approach to address this issue that more properly generates captions by fusing fine-grained features and region features.First,we extract fine-grained features using a panoramic segmentation algorithm.Second,we suggest two fusion methods and contrast their fusion outcomes.An X-linear Attention Network(X-LAN)serves as the foundation for both fusion methods.According to experimental findings on the COCO dataset,the two-branch fusion approach is superior.It is important to note that on the COCO Karpathy test split,CIDEr is increased up to 134.3%in comparison to the baseline,highlighting the potency and viability of our method.
基金This work was supported in part by the National Natural Science Foundation of China under Grant 61772561,author J.Q,http://www.nsfc.gov.cn/in part by the Science Research Projects of Hunan Provincial Education Department under Grant 18A174,author X.X,http://kxjsc.gov.hnedu.cn/+5 种基金in part by the Science Research Projects of Hunan Provincial Education Department under Grant 19B584,author Y.T,http://kxjsc.gov.hnedu.cn/in part by the Natural Science Foundation of Hunan Province(No.2020JJ4140),author Y.T,http://kjt.hunan.gov.cn/in part by the Natural Science Foundation of Hunan Province(No.2020JJ4141),author X.X,http://kjt.hunan.gov.cn/in part by the Key Research and Development Plan of Hunan Province under Grant 2019SK2022,author Y.T,http://kjt.hunan.gov.cn/in part by the Key Research and Development Plan of Hunan Province under Grant CX20200730,author G.H,http://kjt.hunan.gov.cn/in part by the Graduate Science and Technology Innovation Fund Project of Central South University of Forestry and Technology under Grant CX20202038,author G.H,http://jwc.csuft.edu.cn/.
文摘Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation algorithms have problems with mis-segmentation and poor edge segmentation.To address these challenges,we propose a medical image segmentation network(AF-Net)based on attention mechanism and feature fusion,which can effectively capture global information while focusing the network on the object area.In this approach,we add dual attention blocks(DA-block)to the backbone network,which comprises parallel channels and spatial attention branches,to adaptively calibrate and weigh features.Secondly,the multi-scale feature fusion block(MFF-block)is proposed to obtain feature maps of different receptive domains and get multi-scale information with less computational consumption.Finally,to restore the locations and shapes of organs,we adopt the global feature fusion blocks(GFF-block)to fuse high-level and low-level information,which can obtain accurate pixel positioning.We evaluate our method on multiple datasets(the aorta and lungs dataset),and the experimental results achieve 94.0%in mIoU and 96.3%in DICE,showing that our approach performs better than U-Net and other state-of-art methods.
基金supported by National Natural Science Foundation of China (Grant No. 41974140)the PetroChina Prospective,Basic,and Strategic Technology Research Project (No. 2021DJ0606)
文摘Spectral decomposition has been widely used in the detection and identifi cation of underground anomalous features(such as faults,river channels,and karst caves).However,the conventional spectral decomposition method is restrained by the window function,and hence,it mostly has low time–frequency focusing and resolution,thereby hampering the fi ne interpretation of seismic targets.To solve this problem,we investigated the sparse inverse spectral decomposition constrained by the lp norm(0<p≤1).Using a numerical model,we demonstrated the higher time–frequency resolution of this method and its capability for improving the seismic interpretation for thin layers.Moreover,given the actual underground geology that can be often complex,we further propose a p-norm constrained inverse spectral attribute interpretation method based on multiresolution time–frequency feature fusion.By comprehensively analyzing the time–frequency spectrum results constrained by the diff erent p-norms,we can obtain more refined interpretation results than those obtained by the traditional strategy,which incorporates a single norm constraint.Finally,the proposed strategy was applied to the processing and interpretation of actual three-dimensional seismic data for a study area covering about 230 km^(2) in western China.The results reveal that the surface water system in this area is characterized by stepwise convergence from a higher position in the north(a buried hill)toward the south and by the development of faults.We thus demonstrated that the proposed method has huge application potential in seismic interpretation.
基金The support of this research was by Hubei Provincial Natural Science Foundation(2022CFB449)Science Research Foundation of Education Department of Hubei Province(B2020061),are gratefully acknowledged.
文摘The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.