Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have becom...Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications.展开更多
Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the f...Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis.展开更多
Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification...Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification accuracy of hyperspectral images.To address this problem,this article proposes an algorithm based on multiscale fusion and transformer network for hyperspectral image classification.Firstly,the low-level spatial-spectral features are extracted by multi-scale residual structure.Secondly,an attention module is introduced to focus on the more important spatialspectral information.Finally,high-level semantic features are represented and learned by a token learner and an improved transformer encoder.The proposed algorithm is compared with six classical hyperspectral classification algorithms on real hyperspectral images.The experimental results show that the proposed algorithm effectively improves the land cover classification accuracy of hyperspectral images.展开更多
Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enh...Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection.展开更多
针对视觉传感器采集到的图像进行三维人体姿态估计,提出一种双循环Transformer网络模型,有效地从二维关键关节点中提取时空维度高相关性特征,增大感受野,从而提高三维姿态估计的精度。通过在视觉传感器采集得到的公开数据集Human3.6M上...针对视觉传感器采集到的图像进行三维人体姿态估计,提出一种双循环Transformer网络模型,有效地从二维关键关节点中提取时空维度高相关性特征,增大感受野,从而提高三维姿态估计的精度。通过在视觉传感器采集得到的公开数据集Human3.6M上的仿真实验,验证了双循环Transformer算法的性能。分析结果表明,最终估计得到的三维人体关节点的平均关节点位置偏差MPJPE(Mean Per Joint Position Error)为41.6 mm,相比于现有方法有一定提升,可以应用到许多下游相关工作中,有着较强的应用价值。展开更多
Noise has traditionally been suppressed or eliminated in seismic data sets by the use of Fourier filters and, to a lesser degree, nonlinear statistical filters. Although these methods are quite useful under specific c...Noise has traditionally been suppressed or eliminated in seismic data sets by the use of Fourier filters and, to a lesser degree, nonlinear statistical filters. Although these methods are quite useful under specific conditions, they may produce undesirable effects for the low signal to noise ratio data. In this paper, a new method, multi-scale ridgelet transform, is used in the light of the theory of ridgelet transform. We employ wavelet transform to do sub-band decomposition for the signals and then use non-linear thresholding in ridgelet domain for every block. In other words, it is based on the idea of partition, at sufficiently fine scale, a curving singularity looks straight, and so ridgelet transform can work well in such cases. Applications on both synthetic data and actual seismic data from Sichuan basin, South China, show that the new method eliminates the noise portion of the signal more efficiently and retains a greater amount of geologic data than other methods, the quality and consecutiveness of seismic event are improved obviously as well as the quality of section is improved.展开更多
Retinal vessel segmentation in fundus images plays an essential role in the screening,diagnosis,and treatment of many diseases.The acquired fundus images generally have the following problems:uneven illumination,high ...Retinal vessel segmentation in fundus images plays an essential role in the screening,diagnosis,and treatment of many diseases.The acquired fundus images generally have the following problems:uneven illumination,high noise,and complex structure.It makes vessel segmentation very challenging.Previous methods of retinal vascular segmentation mainly use convolutional neural networks on U Network(U-Net)models,and they have many limitations and shortcomings,such as the loss of microvascular details at the end of the vessels.We address the limitations of convolution by introducing the transformer into retinal vessel segmentation.Therefore,we propose a hybrid method for retinal vessel segmentation based on modulated deformable convolution and the transformer,named DT-Net.Firstly,multi-scale image features are extracted by deformable convolution and multi-head selfattention(MHSA).Secondly,image information is recovered,and vessel morphology is refined by the proposed transformer decoder block.Finally,the local prediction results are obtained by the side output layer.The accuracy of the vessel segmentation is improved by the hybrid loss function.Experimental results show that our method obtains good segmentation performance on Specificity(SP),Sensitivity(SE),Accuracy(ACC),Curve(AUC),and F1-score on three publicly available fundus datasets such as DRIVE,STARE,and CHASE_DB1.展开更多
In the traditional incremental analysis update(IAU)process,all analysis increments are treated as constant forcing in a model’s prognostic equations over a certain time window.This approach effectively reduces high-f...In the traditional incremental analysis update(IAU)process,all analysis increments are treated as constant forcing in a model’s prognostic equations over a certain time window.This approach effectively reduces high-frequency oscillations introduced by data assimilation.However,as different scales of increments have unique evolutionary speeds and life histories in a numerical model,the traditional IAU scheme cannot fully meet the requirements of short-term forecasting for the damping of high-frequency noise and may even cause systematic drifts.Therefore,a multi-scale IAU scheme is proposed in this paper.Analysis increments were divided into different scale parts using a spatial filtering technique.For each scale increment,the optimal relaxation time in the IAU scheme was determined by the skill of the forecasting results.Finally,different scales of analysis increments were added to the model integration during their optimal relaxation time.The multi-scale IAU scheme can effectively reduce the noise and further improve the balance between large-scale and small-scale increments in the model initialization stage.To evaluate its performance,several numerical experiments were conducted to simulate the path and intensity of Typhoon Mangkhut(2018)and showed that:(1)the multi-scale IAU scheme had an obvious effect on noise control at the initial stage of data assimilation;(2)the optimal relaxation time for large-scale and small-scale increments was estimated as 6 h and 3 h,respectively;(3)the forecast performance of the multi-scale IAU scheme in the prediction of Typhoon Mangkhut(2018)was better than that of the traditional IAU scheme.The results demonstrate the superiority of the multi-scale IAU scheme.展开更多
基金supported by the National Natural Science Foundation of China under Grant 62177029the Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX21_0740),China.
文摘Visual object tracking plays a crucial role in computer vision.In recent years,researchers have proposed various methods to achieve high-performance object tracking.Among these,methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information.However,current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information.In this paper,we introduce self-calibration multi-head self-attention Transformer(SMSTracker)as a solution to these challenges.It employs a hybrid tensor decomposition self-organizing multihead self-attention transformermechanism,which not only compresses and accelerates Transformer operations but also significantly reduces redundant data,thereby enhancing the accuracy and efficiency of tracking.Additionally,we introduce a self-calibration attention fusion block to resolve common issues of attention ambiguities and inconsistencies found in traditional trackingmethods,ensuring the stability and reliability of tracking performance across various scenarios.By integrating a hybrid tensor decomposition approach with a self-organizingmulti-head self-attentive transformer mechanism,SMSTracker enhances the efficiency and accuracy of the tracking process.Experimental results show that SMSTracker achieves competitive performance in visual object tracking,promising more robust and efficient tracking systems,demonstrating its potential to providemore robust and efficient tracking solutions in real-world applications.
基金supported in part by the National Natural Science Foundation of China(Grant No.62062003)Natural Science Foundation of Ningxia(Grant No.2023AAC03293).
文摘Computer-aided diagnosis of pneumonia based on deep learning is a research hotspot.However,there are some problems that the features of different sizes and different directions are not sufficient when extracting the features in lung X-ray images.A pneumonia classification model based on multi-scale directional feature enhancement MSD-Net is proposed in this paper.The main innovations are as follows:Firstly,the Multi-scale Residual Feature Extraction Module(MRFEM)is designed to effectively extract multi-scale features.The MRFEM uses dilated convolutions with different expansion rates to increase the receptive field and extract multi-scale features effectively.Secondly,the Multi-scale Directional Feature Perception Module(MDFPM)is designed,which uses a three-branch structure of different sizes convolution to transmit direction feature layer by layer,and focuses on the target region to enhance the feature information.Thirdly,the Axial Compression Former Module(ACFM)is designed to perform global calculations to enhance the perception ability of global features in different directions.To verify the effectiveness of the MSD-Net,comparative experiments and ablation experiments are carried out.In the COVID-19 RADIOGRAPHY DATABASE,the Accuracy,Recall,Precision,F1 Score,and Specificity of MSD-Net are 97.76%,95.57%,95.52%,95.52%,and 98.51%,respectively.In the chest X-ray dataset,the Accuracy,Recall,Precision,F1 Score and Specificity of MSD-Net are 97.78%,95.22%,96.49%,95.58%,and 98.11%,respectively.This model improves the accuracy of lung image recognition effectively and provides an important clinical reference to pneumonia Computer-Aided Diagnosis.
基金National Natural Science Foundation of China(No.62201457)Natural Science Foundation of Shaanxi Province(Nos.2022JQ-668,2022JQ-588)。
文摘Convolutional neural network(CNN)has excellent ability to model locally contextual information.However,CNNs face challenges for descripting long-range semantic features,which will lead to relatively low classification accuracy of hyperspectral images.To address this problem,this article proposes an algorithm based on multiscale fusion and transformer network for hyperspectral image classification.Firstly,the low-level spatial-spectral features are extracted by multi-scale residual structure.Secondly,an attention module is introduced to focus on the more important spatialspectral information.Finally,high-level semantic features are represented and learned by a token learner and an improved transformer encoder.The proposed algorithm is compared with six classical hyperspectral classification algorithms on real hyperspectral images.The experimental results show that the proposed algorithm effectively improves the land cover classification accuracy of hyperspectral images.
基金Deanship of Research and Graduate Studies at King Khalid University for funding this work through Small Group Research Project under Grant Number RGP1/261/45.
文摘Breast cancer is a significant threat to the global population,affecting not only women but also a threat to the entire population.With recent advancements in digital pathology,Eosin and hematoxylin images provide enhanced clarity in examiningmicroscopic features of breast tissues based on their staining properties.Early cancer detection facilitates the quickening of the therapeutic process,thereby increasing survival rates.The analysis made by medical professionals,especially pathologists,is time-consuming and challenging,and there arises a need for automated breast cancer detection systems.The upcoming artificial intelligence platforms,especially deep learning models,play an important role in image diagnosis and prediction.Initially,the histopathology biopsy images are taken from standard data sources.Further,the gathered images are given as input to the Multi-Scale Dilated Vision Transformer,where the essential features are acquired.Subsequently,the features are subjected to the Bidirectional Long Short-Term Memory(Bi-LSTM)for classifying the breast cancer disorder.The efficacy of the model is evaluated using divergent metrics.When compared with other methods,the proposed work reveals that it offers impressive results for detection.
文摘针对视觉传感器采集到的图像进行三维人体姿态估计,提出一种双循环Transformer网络模型,有效地从二维关键关节点中提取时空维度高相关性特征,增大感受野,从而提高三维姿态估计的精度。通过在视觉传感器采集得到的公开数据集Human3.6M上的仿真实验,验证了双循环Transformer算法的性能。分析结果表明,最终估计得到的三维人体关节点的平均关节点位置偏差MPJPE(Mean Per Joint Position Error)为41.6 mm,相比于现有方法有一定提升,可以应用到许多下游相关工作中,有着较强的应用价值。
基金supported by China Petrochemical key project during the 11th Five-year Plan as well as the Doctorate Fund of Ministry of Education of China (No.20050491504)
文摘Noise has traditionally been suppressed or eliminated in seismic data sets by the use of Fourier filters and, to a lesser degree, nonlinear statistical filters. Although these methods are quite useful under specific conditions, they may produce undesirable effects for the low signal to noise ratio data. In this paper, a new method, multi-scale ridgelet transform, is used in the light of the theory of ridgelet transform. We employ wavelet transform to do sub-band decomposition for the signals and then use non-linear thresholding in ridgelet domain for every block. In other words, it is based on the idea of partition, at sufficiently fine scale, a curving singularity looks straight, and so ridgelet transform can work well in such cases. Applications on both synthetic data and actual seismic data from Sichuan basin, South China, show that the new method eliminates the noise portion of the signal more efficiently and retains a greater amount of geologic data than other methods, the quality and consecutiveness of seismic event are improved obviously as well as the quality of section is improved.
基金supported in part by the National Natural Science Foundation of China under Grant 61972267the National Natural Science Foundation of Hebei Province under Grant F2018210148the University Science Research Project of Hebei Province under Grant ZD2021334.
文摘Retinal vessel segmentation in fundus images plays an essential role in the screening,diagnosis,and treatment of many diseases.The acquired fundus images generally have the following problems:uneven illumination,high noise,and complex structure.It makes vessel segmentation very challenging.Previous methods of retinal vascular segmentation mainly use convolutional neural networks on U Network(U-Net)models,and they have many limitations and shortcomings,such as the loss of microvascular details at the end of the vessels.We address the limitations of convolution by introducing the transformer into retinal vessel segmentation.Therefore,we propose a hybrid method for retinal vessel segmentation based on modulated deformable convolution and the transformer,named DT-Net.Firstly,multi-scale image features are extracted by deformable convolution and multi-head selfattention(MHSA).Secondly,image information is recovered,and vessel morphology is refined by the proposed transformer decoder block.Finally,the local prediction results are obtained by the side output layer.The accuracy of the vessel segmentation is improved by the hybrid loss function.Experimental results show that our method obtains good segmentation performance on Specificity(SP),Sensitivity(SE),Accuracy(ACC),Curve(AUC),and F1-score on three publicly available fundus datasets such as DRIVE,STARE,and CHASE_DB1.
基金jointly sponsored by the Shenzhen Science and Technology Innovation Commission (Grant No. KCXFZ20201221173610028)the key program of the National Natural Science Foundation of China (Grant No. 42130605)
文摘In the traditional incremental analysis update(IAU)process,all analysis increments are treated as constant forcing in a model’s prognostic equations over a certain time window.This approach effectively reduces high-frequency oscillations introduced by data assimilation.However,as different scales of increments have unique evolutionary speeds and life histories in a numerical model,the traditional IAU scheme cannot fully meet the requirements of short-term forecasting for the damping of high-frequency noise and may even cause systematic drifts.Therefore,a multi-scale IAU scheme is proposed in this paper.Analysis increments were divided into different scale parts using a spatial filtering technique.For each scale increment,the optimal relaxation time in the IAU scheme was determined by the skill of the forecasting results.Finally,different scales of analysis increments were added to the model integration during their optimal relaxation time.The multi-scale IAU scheme can effectively reduce the noise and further improve the balance between large-scale and small-scale increments in the model initialization stage.To evaluate its performance,several numerical experiments were conducted to simulate the path and intensity of Typhoon Mangkhut(2018)and showed that:(1)the multi-scale IAU scheme had an obvious effect on noise control at the initial stage of data assimilation;(2)the optimal relaxation time for large-scale and small-scale increments was estimated as 6 h and 3 h,respectively;(3)the forecast performance of the multi-scale IAU scheme in the prediction of Typhoon Mangkhut(2018)was better than that of the traditional IAU scheme.The results demonstrate the superiority of the multi-scale IAU scheme.