Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose ...Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose an innovative transformer-based method.This method leverages transformers,which are adept at processing data sequences,to enhance cloud detection accuracy.Additionally,we introduce a Cyclic Refinement Architecture that improves the resolution and quality of feature extraction,thereby aiding in the retention of critical details often lost during cloud detection.Our extensive experimental validation shows that our approach significantly outperforms established models,excelling in high-resolution feature extraction and precise cloud segmentation.By integrating Positional Visual Transformers(PVT)with this architecture,our method advances high-resolution feature delineation and segmentation accuracy.Ultimately,our research offers a novel perspective for surmounting traditional challenges in cloud detection and contributes to the advancement of precise and dependable image analysis across various domains.展开更多
Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public ...Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images.展开更多
Photoacoustic imaging(PAI)is a noninvasive emerging imaging method based on the photoacoustic effect,which provides necessary assistance for medical diagnosis.It has the characteristics of large imaging depth and high...Photoacoustic imaging(PAI)is a noninvasive emerging imaging method based on the photoacoustic effect,which provides necessary assistance for medical diagnosis.It has the characteristics of large imaging depth and high contrast.However,limited by the equipment cost and reconstruction time requirements,the existing PAI systems distributed with annular array transducers are difficult to take into account both the image quality and the imaging speed.In this paper,a triple-path feature transform network(TFT-Net)for ring-array photoacoustic tomography is proposed to enhance the imaging quality from limited-view and sparse measurement data.Specifically,the network combines the raw photoacoustic pressure signals and conventional linear reconstruction images as input data,and takes the photoacoustic physical model as a prior information to guide the reconstruction process.In addition,to enhance the ability of extracting signal features,the residual block and squeeze and excitation block are introduced into the TFT-Net.For further efficient reconstruction,the final output of photoacoustic signals uses‘filter-then-upsample’operation with a pixel-shuffle multiplexer and a max out module.Experiment results on simulated and in-vivo data demonstrate that the constructed TFT-Net can restore the target boundary clearly,reduce background noise,and realize fast and high-quality photoacoustic image reconstruction of limited view with sparse sampling.展开更多
Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification...Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification accuracy of hyperspectral images.To address this problem,this article proposes an algorithm based on multiscale fusion and transformer network for hyperspectral image classification.Firstly,the low-level spatial-spectral features are extracted by multi-scale residual structure.Secondly,an attention module is introduced to focus on the more important spatialspectral information.Finally,high-level semantic features are represented and learned by a token learner and an improved transformer encoder.The proposed algorithm is compared with six classical hyperspectral classification algorithms on real hyperspectral images.The experimental results show that the proposed algorithm effectively improves the land cover classification accuracy of hyperspectral images.展开更多
Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enh...Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection.展开更多
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such...Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.展开更多
In response to the problem of inadequate utilization of local information in PolSAR image classification using Vision Transformer in existing studies, this paper proposes a Vision Transformer method considering local ...In response to the problem of inadequate utilization of local information in PolSAR image classification using Vision Transformer in existing studies, this paper proposes a Vision Transformer method considering local information, LIViT. The method replaces image patch sequence with polarimetric feature sequence in the feature embedding, and uses convolution for mapping to preserve image spatial detail information. On the other hand, the addition of the wavelet transform branch enables the network to pay more attention to the shape and edge information of the feature target and improves the extraction of local edge information. The results in Wuhan, China and Flevoland, Netherlands show that considering local information when using Vision Transformer for PolSAR image classification effectively improves the image classification accuracy and shows better advantages in PolSAR image classification.展开更多
Deep convolutional neural network (CNN) greatly promotes the automatic segmentation of medical images. However, due to the inherent properties of convolution operations, CNN usually cannot establish long-distance inte...Deep convolutional neural network (CNN) greatly promotes the automatic segmentation of medical images. However, due to the inherent properties of convolution operations, CNN usually cannot establish long-distance interdependence, which limits the segmentation performance. Transformer has been successfully applied to various computer vision, using self-attention mechanism to simulate long-distance interaction, so as to capture global information. However, self-attention lacks spatial location and high-performance computing. In order to solve the above problems, we develop a new medical transformer, which has a multi-scale context fusion function and can be used for medical image segmentation. The proposed model combines convolution operation and attention mechanism to form a u-shaped framework, which can capture both local and global information. First, the traditional converter module is improved to an advanced converter module, which uses post-layer normalization to obtain mild activation values, and uses scaled cosine attention with a moving window to obtain accurate spatial information. Secondly, we also introduce a deep supervision strategy to guide the model to fuse multi-scale feature information. It further enables the proposed model to effectively propagate feature information across layers, Thanks to this, it can achieve better segmentation performance while being more robust and efficient. The proposed model is evaluated on multiple medical image segmentation datasets. Experimental results demonstrate that the proposed model achieves better performance on a challenging dataset (ETIS) compared to existing methods that rely only on convolutional neural networks, transformers, or a combination of both. The mDice and mIou indicators increased by 2.74% and 3.3% respectively.展开更多
Breast cancer has become a killer of women's health nowadays.In order to exploit the potential representational capabilities of the models more comprehensively,we propose a multi-model fusion strategy.Specifically...Breast cancer has become a killer of women's health nowadays.In order to exploit the potential representational capabilities of the models more comprehensively,we propose a multi-model fusion strategy.Specifically,we combine two differently structured deep learning models,ResNet101 and Swin Transformer(SwinT),with the addition of the Convolutional Block Attention Module(CBAM)attention mechanism,which makes full use of SwinT's global context information modeling ability and ResNet101's local feature extraction ability,and additionally the cross entropy loss function is replaced by the focus loss function to solve the problem of unbalanced allocation of breast cancer data sets.The multi-classification recognition accuracies of the proposed fusion model under 40X,100X,200X and 400X BreakHis datasets are 97.50%,96.60%,96.30 and 96.10%,respectively.Compared with a single SwinT model and ResNet 101 model,the fusion model has higher accuracy and better generalization ability,which provides a more effective method for screening,diagnosis and pathological classification of female breast cancer.展开更多
Radioheliographs can obtain solar images at high temporal and spatial resolution,with a high dynamic range.These are among the most important instruments for studying solar radio bursts,understanding solar eruption ev...Radioheliographs can obtain solar images at high temporal and spatial resolution,with a high dynamic range.These are among the most important instruments for studying solar radio bursts,understanding solar eruption events,and conducting space weather forecasting.This study aims to explore the effective use of radioheliographs for solar observations,specifically for imaging coronal mass ejections(CME),to track their evolution and provide space weather warnings.We have developed an imaging simulation program based on the principle of aperture synthesis imaging,covering the entire data processing flow from antenna configuration to dirty map generation.For grid processing,we propose an improved non-uniform fast Fourier transform(NUFFT)method to provide superior image quality.Using simulated imaging of radio coronal mass ejections,we provide practical recommendations for the performance of radioheliographs.This study provides important support for the validation and calibration of radioheliograph data processing,and is expected to profoundly enhance our understanding of solar activities.展开更多
Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number...Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number of labeled data,which limits the application.Self-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is proposed.Masked autoencoders(MAE)are used in the fine-tune vision transformer(ViT)model.In addition,the scheme of extracting image descriptors is discussed.The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels.The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets.For the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.展开更多
The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectiv...The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks.展开更多
Vascular segmentation is a crucial task in biomedical image processing,which is significant for analyzing and modeling vascular networks under physiological and pathological states.With advances in fluorescent labelin...Vascular segmentation is a crucial task in biomedical image processing,which is significant for analyzing and modeling vascular networks under physiological and pathological states.With advances in fluorescent labeling and mesoscopic optical techniques,it has become possible to map the whole-mouse-brain vascular networks at capillary resolution.However,segmenting vessels from mesoscopic optical images is a challenging task.The problems,such as vascular signal discontinuities,vessel lumens,and background fluorescence signals in mesoscopic optical images,belong to global semantic information during vascular segmentation.Traditional vascular segmentation methods based on convolutional neural networks(CNNs)have been limited by their insufficient receptive fields,making it challenging to capture global semantic information of vessels and resulting in inaccurate segmentation results.Here,we propose SegVesseler,a vascular segmentation method based on Swin Transformer.SegVesseler adopts 3D Swin Transformer blocks to extract global contextual information in 3D images.This approach is able to maintain the connectivity and topology of blood vessels during segmentation.We evaluated the performance of our method on mouse cerebrovascular datasets generated from three different labeling and imaging modalities.The experimental results demonstrate that the segmentation effect of our method is significantly better than traditional CNNs and achieves state-of-the-art performance.展开更多
With the increasing popularity of artificial intelligence applications,machine learning is also playing an increasingly important role in the Internet of Things(IoT)and the Internet of Vehicles(IoV).As an essential pa...With the increasing popularity of artificial intelligence applications,machine learning is also playing an increasingly important role in the Internet of Things(IoT)and the Internet of Vehicles(IoV).As an essential part of the IoV,smart transportation relies heavily on information obtained from images.However,inclement weather,such as snowy weather,negatively impacts the process and can hinder the regular operation of imaging equipment and the acquisition of conventional image information.Not only that,but the snow also makes intelligent transportation systems make the wrong judgment of road conditions and the entire system of the Internet of Vehicles adverse.This paper describes the single image snowremoval task and the use of a vision transformer to generate adversarial networks.The residual structure is used in the algorithm,and the Transformer structure is used in the network structure of the generator in the generative adversarial networks,which improves the accuracy of the snow removal task.Moreover,the vision transformer has good scalability and versatility for larger models and has a more vital fitting ability than the previously popular convolutional neural networks.The Snow100K dataset is used for training,testing and comparison,and the peak signal-to-noise ratio and structural similarity are used as evaluation indicators.The experimental results show that the improved snow removal algorithm performs well and can obtain high-quality snow removal images.展开更多
Contactless verification is possible with iris biometric identification,which helps prevent infections like COVID-19 from spreading.Biometric systems have grown unsteady and dangerous as a result of spoofing assaults ...Contactless verification is possible with iris biometric identification,which helps prevent infections like COVID-19 from spreading.Biometric systems have grown unsteady and dangerous as a result of spoofing assaults employing contact lenses,replayed the video,and print attacks.The work demonstrates an iris liveness detection approach by utilizing fragmental coefficients of Haar transformed Iris images as signatures to prevent spoofing attacks for the very first time in the identification of iris liveness.Seven assorted feature creation ways are studied in the presented solutions,and these created features are explored for the training of eight distinct machine learning classifiers and ensembles.The predicted iris liveness identification variants are evaluated using recall,F-measure,precision,accuracy,APCER,BPCER,and ACER.Three standard datasets were used in the investigation.The main contribution of our study is achieving a good accuracy of 99.18%with a smaller feature vector.The fragmental coefficients of Haar transformed iris image of size 8∗8 utilizing random forest algorithm showed superior iris liveness detection with reduced featured vector size(64 features).Random forest gave 99.18%accuracy.Additionally,conduct an extensive experiment on cross datasets for detailed analysis.The results of our experiments showthat the iris biometric template is decreased in size tomake the proposed framework suitable for algorithmic verification in real-time environments and settings.展开更多
In medical image segmentation task,convolutional neural networks(CNNs)are difficult to capture long-range dependencies,but transformers can model the long-range dependencies effectively.However,transformers have a fle...In medical image segmentation task,convolutional neural networks(CNNs)are difficult to capture long-range dependencies,but transformers can model the long-range dependencies effectively.However,transformers have a flexible structure and seldom assume the structural bias of input data,so it is difficult for transformers to learn positional encoding of the medical images when using fewer images for training.To solve these problems,a dual branch structure is proposed.In one branch,Mix-Feed-Forward Network(Mix-FFN)and axial attention are adopted to capture long-range dependencies and keep the translation invariance of the model.Mix-FFN whose depth-wise convolutions can provide position information is better than ordinary positional encoding.In the other branch,traditional convolutional neural networks(CNNs)are used to extract different features of fewer medical images.In addition,the attention fusion module BiFusion is used to effectively integrate the information from the CNN branch and Transformer branch,and the fused features can effectively capture the global and local context of the current spatial resolution.On the public standard datasets Gland Segmentation(GlaS),Colorectal adenocarcinoma gland(CRAG)and COVID-19 CT Images Segmentation,the F1-score,Intersection over Union(IoU)and parameters of the proposed TC-Fuse are superior to those by Axial Attention U-Net,U-Net,Medical Transformer and other methods.And F1-score increased respectively by 2.99%,3.42%and 3.95%compared with Medical Transformer.展开更多
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
基金funded by the Chongqing Normal University Startup Foundation for PhD(22XLB021)supported by the Open Research Project of the State Key Laboratory of Industrial Control Technology,Zhejiang University,China(No.ICT2023B40).
文摘Cloud detection from satellite and drone imagery is crucial for applications such as weather forecasting and environmentalmonitoring.Addressing the limitations of conventional convolutional neural networks,we propose an innovative transformer-based method.This method leverages transformers,which are adept at processing data sequences,to enhance cloud detection accuracy.Additionally,we introduce a Cyclic Refinement Architecture that improves the resolution and quality of feature extraction,thereby aiding in the retention of critical details often lost during cloud detection.Our extensive experimental validation shows that our approach significantly outperforms established models,excelling in high-resolution feature extraction and precise cloud segmentation.By integrating Positional Visual Transformers(PVT)with this architecture,our method advances high-resolution feature delineation and segmentation accuracy.Ultimately,our research offers a novel perspective for surmounting traditional challenges in cloud detection and contributes to the advancement of precise and dependable image analysis across various domains.
基金Project supported by the National Natural Science Foundation of China(Grant No.62075241).
文摘Single-pixel imaging(SPI)can transform 2D or 3D image data into 1D light signals,which offers promising prospects for image compression and transmission.However,during data communication these light signals in public channels will easily draw the attention of eavesdroppers.Here,we introduce an efficient encryption method for SPI data transmission that uses the 3D Arnold transformation to directly disrupt 1D single-pixel light signals and utilizes the elliptic curve encryption algorithm for key transmission.This encryption scheme immediately employs Hadamard patterns to illuminate the scene and then utilizes the 3D Arnold transformation to permutate the 1D light signal of single-pixel detection.Then the transformation parameters serve as the secret key,while the security of key exchange is guaranteed by an elliptic curve-based key exchange mechanism.Compared with existing encryption schemes,both computer simulations and optical experiments have been conducted to demonstrate that the proposed technique not only enhances the security of encryption but also eliminates the need for complicated pattern scrambling rules.Additionally,this approach solves the problem of secure key transmission,thus ensuring the security of information and the quality of the decrypted images.
基金supported by National Key R&D Program of China[2022YFC2402400]the National Natural Science Foundation of China[Grant No.62275062]Guangdong Provincial Key Laboratory of Biomedical Optical Imaging Technology[Grant No.2020B121201010-4].
文摘Photoacoustic imaging(PAI)is a noninvasive emerging imaging method based on the photoacoustic effect,which provides necessary assistance for medical diagnosis.It has the characteristics of large imaging depth and high contrast.However,limited by the equipment cost and reconstruction time requirements,the existing PAI systems distributed with annular array transducers are difficult to take into account both the image quality and the imaging speed.In this paper,a triple-path feature transform network(TFT-Net)for ring-array photoacoustic tomography is proposed to enhance the imaging quality from limited-view and sparse measurement data.Specifically,the network combines the raw photoacoustic pressure signals and conventional linear reconstruction images as input data,and takes the photoacoustic physical model as a prior information to guide the reconstruction process.In addition,to enhance the ability of extracting signal features,the residual block and squeeze and excitation block are introduced into the TFT-Net.For further efficient reconstruction,the final output of photoacoustic signals uses‘filter-then-upsample’operation with a pixel-shuffle multiplexer and a max out module.Experiment results on simulated and in-vivo data demonstrate that the constructed TFT-Net can restore the target boundary clearly,reduce background noise,and realize fast and high-quality photoacoustic image reconstruction of limited view with sparse sampling.
基金National Natural Science Foundation of China(No.62201457)Natural Science Foundation of Shaanxi Province(Nos.2022JQ-668,2022JQ-588)。
文摘Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification accuracy of hyperspectral images.To address this problem,this article proposes an algorithm based on multiscale fusion and transformer network for hyperspectral image classification.Firstly,the low-level spatial-spectral features are extracted by multi-scale residual structure.Secondly,an attention module is introduced to focus on the more important spatialspectral information.Finally,high-level semantic features are represented and learned by a token learner and an improved transformer encoder.The proposed algorithm is compared with six classical hyperspectral classification algorithms on real hyperspectral images.The experimental results show that the proposed algorithm effectively improves the land cover classification accuracy of hyperspectral images.
基金Deanship of Research and Graduate Studies at King Khalid University for funding this work through Small Group Research Project under Grant Number RGP1/261/45.
文摘Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection.
文摘Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.
文摘In response to the problem of inadequate utilization of local information in PolSAR image classification using Vision Transformer in existing studies, this paper proposes a Vision Transformer method considering local information, LIViT. The method replaces image patch sequence with polarimetric feature sequence in the feature embedding, and uses convolution for mapping to preserve image spatial detail information. On the other hand, the addition of the wavelet transform branch enables the network to pay more attention to the shape and edge information of the feature target and improves the extraction of local edge information. The results in Wuhan, China and Flevoland, Netherlands show that considering local information when using Vision Transformer for PolSAR image classification effectively improves the image classification accuracy and shows better advantages in PolSAR image classification.
文摘Deep convolutional neural network (CNN) greatly promotes the automatic segmentation of medical images. However, due to the inherent properties of convolution operations, CNN usually cannot establish long-distance interdependence, which limits the segmentation performance. Transformer has been successfully applied to various computer vision, using self-attention mechanism to simulate long-distance interaction, so as to capture global information. However, self-attention lacks spatial location and high-performance computing. In order to solve the above problems, we develop a new medical transformer, which has a multi-scale context fusion function and can be used for medical image segmentation. The proposed model combines convolution operation and attention mechanism to form a u-shaped framework, which can capture both local and global information. First, the traditional converter module is improved to an advanced converter module, which uses post-layer normalization to obtain mild activation values, and uses scaled cosine attention with a moving window to obtain accurate spatial information. Secondly, we also introduce a deep supervision strategy to guide the model to fuse multi-scale feature information. It further enables the proposed model to effectively propagate feature information across layers, Thanks to this, it can achieve better segmentation performance while being more robust and efficient. The proposed model is evaluated on multiple medical image segmentation datasets. Experimental results demonstrate that the proposed model achieves better performance on a challenging dataset (ETIS) compared to existing methods that rely only on convolutional neural networks, transformers, or a combination of both. The mDice and mIou indicators increased by 2.74% and 3.3% respectively.
基金By the National Natural Science Foundation of China(NSFC)(No.61772358),the National Key R&D Program Funded Project(No.2021YFE0105500),and the Jiangsu University‘Blue Project’.
文摘Breast cancer has become a killer of women's health nowadays.In order to exploit the potential representational capabilities of the models more comprehensively,we propose a multi-model fusion strategy.Specifically,we combine two differently structured deep learning models,ResNet101 and Swin Transformer(SwinT),with the addition of the Convolutional Block Attention Module(CBAM)attention mechanism,which makes full use of SwinT's global context information modeling ability and ResNet101's local feature extraction ability,and additionally the cross entropy loss function is replaced by the focus loss function to solve the problem of unbalanced allocation of breast cancer data sets.The multi-classification recognition accuracies of the proposed fusion model under 40X,100X,200X and 400X BreakHis datasets are 97.50%,96.60%,96.30 and 96.10%,respectively.Compared with a single SwinT model and ResNet 101 model,the fusion model has higher accuracy and better generalization ability,which provides a more effective method for screening,diagnosis and pathological classification of female breast cancer.
基金supported by the grants of National Natural Science Foundation of China(42374219,42127804)the Qilu Young Researcher Project of Shandong University.
文摘Radioheliographs can obtain solar images at high temporal and spatial resolution,with a high dynamic range.These are among the most important instruments for studying solar radio bursts,understanding solar eruption events,and conducting space weather forecasting.This study aims to explore the effective use of radioheliographs for solar observations,specifically for imaging coronal mass ejections(CME),to track their evolution and provide space weather warnings.We have developed an imaging simulation program based on the principle of aperture synthesis imaging,covering the entire data processing flow from antenna configuration to dirty map generation.For grid processing,we propose an improved non-uniform fast Fourier transform(NUFFT)method to provide superior image quality.Using simulated imaging of radio coronal mass ejections,we provide practical recommendations for the performance of radioheliographs.This study provides important support for the validation and calibration of radioheliograph data processing,and is expected to profoundly enhance our understanding of solar activities.
基金the Project of Introducing Urgently Needed Talents in Key Supporting Regions of Shandong Province,China(No.SDJQP20221805)。
文摘Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number of labeled data,which limits the application.Self-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is proposed.Masked autoencoders(MAE)are used in the fine-tune vision transformer(ViT)model.In addition,the scheme of extracting image descriptors is discussed.The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels.The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets.For the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.
基金Young Innovative Talents Project of Guangdong Ordinary Universities(No.2022KQNCX225)School-level Teaching and Research Project of Guangzhou City Polytechnic(No.2022xky046)。
文摘The semantic segmentation methods based on CNN have made great progress,but there are still some shortcomings in the application of remote sensing images segmentation,such as the small receptive field can not effectively capture global context.In order to solve this problem,this paper proposes a hybrid model based on ResNet50 and swin transformer to directly capture long-range dependence,which fuses features through Cross Feature Modulation Module(CFMM).Experimental results on two publicly available datasets,Vaihingen and Potsdam,are mIoU of 70.27%and 76.63%,respectively.Thus,CFM-UNet can maintain a high segmentation performance compared with other competitive networks.
基金supported by the STI2030-Major Projects (2021ZD0201002)the National Natural Science Foundation of China (82102137,T2122015)+2 种基金Natural Science Foundation of Shaanxi Provincial Department of Education (21JK0796)the Open Project Program of Wuhan National Laboratory for Optoelectronics (2021WNL OKF006)the Natural Science Foundation of Sichuan Province (2022NSFSC0964).
文摘Vascular segmentation is a crucial task in biomedical image processing,which is significant for analyzing and modeling vascular networks under physiological and pathological states.With advances in fluorescent labeling and mesoscopic optical techniques,it has become possible to map the whole-mouse-brain vascular networks at capillary resolution.However,segmenting vessels from mesoscopic optical images is a challenging task.The problems,such as vascular signal discontinuities,vessel lumens,and background fluorescence signals in mesoscopic optical images,belong to global semantic information during vascular segmentation.Traditional vascular segmentation methods based on convolutional neural networks(CNNs)have been limited by their insufficient receptive fields,making it challenging to capture global semantic information of vessels and resulting in inaccurate segmentation results.Here,we propose SegVesseler,a vascular segmentation method based on Swin Transformer.SegVesseler adopts 3D Swin Transformer blocks to extract global contextual information in 3D images.This approach is able to maintain the connectivity and topology of blood vessels during segmentation.We evaluated the performance of our method on mouse cerebrovascular datasets generated from three different labeling and imaging modalities.The experimental results demonstrate that the segmentation effect of our method is significantly better than traditional CNNs and achieves state-of-the-art performance.
基金supported by School of Computer Science and Technology,Shandong University of Technology.This paper is supported by Shandong Provincial Natural Science Foundation,China(Grant Number ZR2019BF022)National Natural Science Foundation of China(Grant Number 62001272).
文摘With the increasing popularity of artificial intelligence applications,machine learning is also playing an increasingly important role in the Internet of Things(IoT)and the Internet of Vehicles(IoV).As an essential part of the IoV,smart transportation relies heavily on information obtained from images.However,inclement weather,such as snowy weather,negatively impacts the process and can hinder the regular operation of imaging equipment and the acquisition of conventional image information.Not only that,but the snow also makes intelligent transportation systems make the wrong judgment of road conditions and the entire system of the Internet of Vehicles adverse.This paper describes the single image snowremoval task and the use of a vision transformer to generate adversarial networks.The residual structure is used in the algorithm,and the Transformer structure is used in the network structure of the generator in the generative adversarial networks,which improves the accuracy of the snow removal task.Moreover,the vision transformer has good scalability and versatility for larger models and has a more vital fitting ability than the previously popular convolutional neural networks.The Snow100K dataset is used for training,testing and comparison,and the peak signal-to-noise ratio and structural similarity are used as evaluation indicators.The experimental results show that the improved snow removal algorithm performs well and can obtain high-quality snow removal images.
基金supported by theResearchers Supporting Project No.RSP-2021/14,King Saud University,Riyadh,Saudi Arabia.
文摘Contactless verification is possible with iris biometric identification,which helps prevent infections like COVID-19 from spreading.Biometric systems have grown unsteady and dangerous as a result of spoofing assaults employing contact lenses,replayed the video,and print attacks.The work demonstrates an iris liveness detection approach by utilizing fragmental coefficients of Haar transformed Iris images as signatures to prevent spoofing attacks for the very first time in the identification of iris liveness.Seven assorted feature creation ways are studied in the presented solutions,and these created features are explored for the training of eight distinct machine learning classifiers and ensembles.The predicted iris liveness identification variants are evaluated using recall,F-measure,precision,accuracy,APCER,BPCER,and ACER.Three standard datasets were used in the investigation.The main contribution of our study is achieving a good accuracy of 99.18%with a smaller feature vector.The fragmental coefficients of Haar transformed iris image of size 8∗8 utilizing random forest algorithm showed superior iris liveness detection with reduced featured vector size(64 features).Random forest gave 99.18%accuracy.Additionally,conduct an extensive experiment on cross datasets for detailed analysis.The results of our experiments showthat the iris biometric template is decreased in size tomake the proposed framework suitable for algorithmic verification in real-time environments and settings.
基金supported in part by the National Natural Science Foundation of China under Grant 61972267the National Natural Science Foundation of Hebei Province under Grant F2018210148+1 种基金the University Science Research Project of Hebei Province under Grant ZD2021334the Science and Technology Project of Hebei Education Department(ZD2022098).
文摘In medical image segmentation task,convolutional neural networks(CNNs)are difficult to capture long-range dependencies,but transformers can model the long-range dependencies effectively.However,transformers have a flexible structure and seldom assume the structural bias of input data,so it is difficult for transformers to learn positional encoding of the medical images when using fewer images for training.To solve these problems,a dual branch structure is proposed.In one branch,Mix-Feed-Forward Network(Mix-FFN)and axial attention are adopted to capture long-range dependencies and keep the translation invariance of the model.Mix-FFN whose depth-wise convolutions can provide position information is better than ordinary positional encoding.In the other branch,traditional convolutional neural networks(CNNs)are used to extract different features of fewer medical images.In addition,the attention fusion module BiFusion is used to effectively integrate the information from the CNN branch and Transformer branch,and the fused features can effectively capture the global and local context of the current spatial resolution.On the public standard datasets Gland Segmentation(GlaS),Colorectal adenocarcinoma gland(CRAG)and COVID-19 CT Images Segmentation,the F1-score,Intersection over Union(IoU)and parameters of the proposed TC-Fuse are superior to those by Axial Attention U-Net,U-Net,Medical Transformer and other methods.And F1-score increased respectively by 2.99%,3.42%and 3.95%compared with Medical Transformer.