For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,whic...For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.展开更多
Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life s...Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.展开更多
Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid i...Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid identification of landslides is important for disaster prevention and control;however,currently,landslide identification relies mainly on the manual interpretation of remote sensing images.Manual interpretation and feature recognition methods are time-consuming,labor-intensive,and challenging when confronted with complex scenarios.Consequently,automatic landslide recognition has emerged as a pivotal avenue for future development.In this study,a dataset comprising 2000 landslide images was constructed using open-source remote sensing images and datasets.The YOLOv7 model was enhanced using data augmentation algorithms and attention mechanisms.Three optimization models were formulated to realize automatic landslide recognition.The findings demonstrate the commendable performance of the optimized model in automatic landslide recognition,achieving a peak accuracy of 95.92%.Subsequently,the optimized model was applied to regional landslide identification,co-seismic landslide identification,and landslide recognition at various scales,all of which showed robust recognition capabilities.Nevertheless,the model exhibits limitations in detecting small targets,indicating areas for refining the deep-learning algorithms.The results of this research offer valuable technical support for the swift identification,prevention,and mitigation of landslide disasters.展开更多
AIM:To evaluate the application of an intelligent diagnostic model for pterygium.METHODS:For intelligent diagnosis of pterygium,the attention mechanisms—SENet,ECANet,CBAM,and Self-Attention—were fused with the light...AIM:To evaluate the application of an intelligent diagnostic model for pterygium.METHODS:For intelligent diagnosis of pterygium,the attention mechanisms—SENet,ECANet,CBAM,and Self-Attention—were fused with the lightweight MobileNetV2 model structure to construct a tri-classification model.The study used 1220 images of three types of anterior ocular segments of the pterygium provided by the Eye Hospital of Nanjing Medical University.Conventional classification models—VGG16,ResNet50,MobileNetV2,and EfficientNetB7—were trained on the same dataset for comparison.To evaluate model performance in terms of accuracy,Kappa value,test time,sensitivity,specificity,the area under curve(AUC),and visual heat map,470 test images of the anterior segment of the pterygium were used.RESULTS:The accuracy of the MobileNetV2+Self-Attention model with 281 MB in model size was 92.77%,and the Kappa value of the model was 88.92%.The testing time using the model was 9ms/image in the server and 138ms/image in the local computer.The sensitivity,specificity,and AUC for the diagnosis of pterygium using normal anterior segment images were 99.47%,100%,and 100%,respectively;using anterior segment images in the observation period were 88.30%,95.32%,and 96.70%,respectively;and using the anterior segment images in the surgery period were 88.18%,94.44%,and 97.30%,respectively.CONCLUSION:The developed model is lightweight and can be used not only for detection but also for assessing the severity of pterygium.展开更多
With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-base...With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-based,and machine learning-based estimation.Since the machine learning-based method can lead to better performance,in this paper,a deep learning-based load estimation algorithm using image fingerprint and attention mechanism is proposed.First,an image fingerprint construction is proposed for training data.After the data preprocessing,the training data matrix is constructed by the cyclic shift and cubic spline interpolation.Then,the linear mapping and the gray-color transformation method are proposed to form the color image fingerprint.Second,a convolutional neural network(CNN)combined with an attentionmechanism is proposed for training performance improvement.At last,an experiment is carried out to evaluate the estimation performance.Compared with the support vector machine method,CNN method and long short-term memory method,the proposed algorithm has the best load estimation performance.展开更多
Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks s...Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.展开更多
Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co...Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.展开更多
The current existing problem of deep learning framework for the detection and segmentation of electrical equipment is dominantly related to low precision.Because of the reliable,safe and easy-to-operate technology pro...The current existing problem of deep learning framework for the detection and segmentation of electrical equipment is dominantly related to low precision.Because of the reliable,safe and easy-to-operate technology provided by deep learning-based video surveillance for unmanned inspection of electrical equipment,this paper uses the bottleneck attention module(BAM)attention mechanism to improve the Solov2 model and proposes a new electrical equipment segmentation mode.Firstly,the BAM attention mechanism is integrated into the feature extraction network to adaptively learn the correlation between feature channels,thereby improving the expression ability of the feature map;secondly,the weighted sum of CrossEntropy Loss and Dice loss is designed as the mask loss to improve the segmentation accuracy and robustness of the model;finally,the non-maximal suppression(NMS)algorithm to better handle the overlap problem in instance segmentation.Experimental results show that the proposed method achieves an average segmentation accuracy of mAP of 80.4% on three types of electrical equipment datasets,including transformers,insulators and voltage transformers,which improve the detection accuracy by more than 5.7% compared with the original Solov2 model.The segmentation model proposed can provide a focusing technical means for the intelligent management of power systems.展开更多
Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fu...Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fusion method does not utilize the correlation information between modalities.To solve this problem,this paper proposes amodel based on amulti-head attention mechanism.First,after preprocessing the original data.Then,the feature representation is converted into a sequence of word vectors and positional encoding is introduced to better understand the semantic and sequential information in the input sequence.Next,the input coding sequence is fed into the transformer model for further processing and learning.At the transformer layer,a cross-modal attention consisting of a pair of multi-head attention modules is employed to reflect the correlation between modalities.Finally,the processed results are input into the feedforward neural network to obtain the emotional output through the classification layer.Through the above processing flow,the model can capture semantic information and contextual relationships and achieve good results in various natural language processing tasks.Our model was tested on the CMU Multimodal Opinion Sentiment and Emotion Intensity(CMU-MOSEI)and Multimodal EmotionLines Dataset(MELD),achieving an accuracy of 82.04% and F1 parameters reached 80.59% on the former dataset.展开更多
To improve the prediction accuracy of chaotic time series and reconstruct a more reasonable phase space structure of the prediction network,we propose a convolutional neural network-long short-term memory(CNN-LSTM)pre...To improve the prediction accuracy of chaotic time series and reconstruct a more reasonable phase space structure of the prediction network,we propose a convolutional neural network-long short-term memory(CNN-LSTM)prediction model based on the incremental attention mechanism.Firstly,a traversal search is conducted through the traversal layer for finite parameters in the phase space.Then,an incremental attention layer is utilized for parameter judgment based on the dimension weight criteria(DWC).The phase space parameters that best meet DWC are selected and fed into the input layer.Finally,the constructed CNN-LSTM network extracts spatio-temporal features and provides the final prediction results.The model is verified using Logistic,Lorenz,and sunspot chaotic time series,and the performance is compared from the two dimensions of prediction accuracy and network phase space structure.Additionally,the CNN-LSTM network based on incremental attention is compared with long short-term memory(LSTM),convolutional neural network(CNN),recurrent neural network(RNN),and support vector regression(SVR)for prediction accuracy.The experiment results indicate that the proposed composite network model possesses enhanced capability in extracting temporal features and achieves higher prediction accuracy.Also,the algorithm to estimate the phase space parameter is compared with the traditional CAO,false nearest neighbor,and C-C,three typical methods for determining the chaotic phase space parameters.The experiments reveal that the phase space parameter estimation algorithm based on the incremental attention mechanism is superior in prediction accuracy compared with the traditional phase space reconstruction method in five networks,including CNN-LSTM,LSTM,CNN,RNN,and SVR.展开更多
Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process ...Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process is time-consuming.For 3D reconstruction data,this paper proposed a lightweight 3D noise reduction method for desktop-level Nano-CT called AAD-ResNet(Axial Attention DeNoise ResNet).The network is framed by theU-net structure.The encoder and decoder are incorporated with the proposed 3D axial attention mechanism and residual dense block.Each layer of the residual dense block can directly access the features of the previous layer,which reduces the redundancy of parameters and improves the efficiency of network training.The 3D axial attention mechanism enhances the correlation between 3D information in the training process and captures the long-distance dependence.It can improve the noise reduction effect and avoid the loss of image structure details.Experimental results show that the network can effectively improve the image quality of a 0.1-s exposure scan to a level close to a 3-s exposure,significantly shortening the sample scanning time.展开更多
Fusarium head blight (FHB) is one of the most destructive diseases in global wheat production. In order to count the FHB-infected wheat ears under field conditions, this study proposed an algorithm for diseased wheat ...Fusarium head blight (FHB) is one of the most destructive diseases in global wheat production. In order to count the FHB-infected wheat ears under field conditions, this study proposed an algorithm for diseased wheat ear detection based on improved YOLOv5s (Tr-YOLOv5s). The Swin Transformer was used to replace the CSPDarknet backbone network to enhance the extraction of characteristic information of the population wheat ears of FHB in the field background. The convolutional block attention module (CBAM) attention mechanism was added to improve the detection effect of target wheat ears, subsequently improving the overall accuracy of the model. The original loss function complete intersection over union (CIoU) was replaced by Scylla intersection over union (SIoU) loss to accelerate the model convergence and decrease the loss value. The results showed that the mean average precision (mAP) of the Tr-YOLOv5s model reached 90.64%, making a 4.63% improvement compared to the original YOLOv5s model. The improved model could quickly detect and count wheat FHB ear in the field environment, which laid a foundation for the subsequent automatic disease identification and grading of wheat FHB under field conditions.展开更多
The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segm...The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segmentation networks fail to extract features in fundus image sufficiently,we propose a novel network(DSeU-net)based on deformable convolution and squeeze excitation residual module.The deformable convolution is utilized to dynamically adjust the receptive field for the feature extraction of retinal vessel.And the squeeze excitation residual module is used to scale the weights of the low-level features so that the network learns the complex relationships of the different feature layers efficiently.We validate the DSeU-net on three public retinal vessel segmentation datasets including DRIVE,CHASEDB1,and STARE,and the experimental results demonstrate the satisfactory segmentation performance of the network.展开更多
Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive c...Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive characteristics and superior soft tissue contrast.However,brain tumors are characterized by high non uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature.In addition,the labeling of tumor areas is time-consuming and laborious.Methods To address these issues,this study uses a residual grouped convolution module,convolutional block attention module,and bilinear interpolation upsampling method to improve the classical segmentation network U-net.The influence of network normalization,loss function,and network depth on segmentation performance is further considered.Results In the experiments,the Dice score of the proposed segmentation model reached 97.581%,which is 12.438%higher than that of traditional U-net,demonstrating the effective segmentation of MRI brain tumor images.Conclusions In conclusion,we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.展开更多
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well a...Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.展开更多
Accurate load forecasting forms a crucial foundation for implementing household demand response plans andoptimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations,a s...Accurate load forecasting forms a crucial foundation for implementing household demand response plans andoptimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations,a single prediction model is hard to capture temporal features effectively, resulting in diminished predictionaccuracy. In this study, a hybrid deep learning framework that integrates attention mechanism, convolution neuralnetwork (CNN), improved chaotic particle swarm optimization (ICPSO), and long short-term memory (LSTM), isproposed for short-term household load forecasting. Firstly, the CNN model is employed to extract features fromthe original data, enhancing the quality of data features. Subsequently, the moving average method is used for datapreprocessing, followed by the application of the LSTM network to predict the processed data. Moreover, the ICPSOalgorithm is introduced to optimize the parameters of LSTM, aimed at boosting the model’s running speed andaccuracy. Finally, the attention mechanism is employed to optimize the output value of LSTM, effectively addressinginformation loss in LSTM induced by lengthy sequences and further elevating prediction accuracy. According tothe numerical analysis, the accuracy and effectiveness of the proposed hybrid model have been verified. It canexplore data features adeptly, achieving superior prediction accuracy compared to other forecasting methods forthe household load exhibiting significant fluctuations across different seasons.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
Various deep learning models have been proposed for the accurate assisted diagnosis of early-stage Alzheimer’s disease(AD).Most studies predominantly employ Convolutional Neural Networks(CNNs),which focus solely on l...Various deep learning models have been proposed for the accurate assisted diagnosis of early-stage Alzheimer’s disease(AD).Most studies predominantly employ Convolutional Neural Networks(CNNs),which focus solely on local features,thus encountering difficulties in handling global features.In contrast to natural images,Structural Magnetic Resonance Imaging(sMRI)images exhibit a higher number of channel dimensions.However,during the Position Embedding stage ofMulti Head Self Attention(MHSA),the coded information related to the channel dimension is disregarded.To tackle these issues,we propose theRepBoTNet-CESA network,an advanced AD-aided diagnostic model that is capable of learning local and global features simultaneously.It combines the advantages of CNN networks in capturing local information and Transformer networks in integrating global information,reducing computational costs while achieving excellent classification performance.Moreover,it uses the Cubic Embedding Self Attention(CESA)proposed in this paper to incorporate the channel code information,enhancing the classification performance within the Transformer structure.Finally,the RepBoTNet-CESA performs well in various AD-aided diagnosis tasks,with an accuracy of 96.58%,precision of 97.26%,and recall of 96.23%in the AD/NC task;an accuracy of 92.75%,precision of 92.84%,and recall of 93.18%in the EMCI/NC task;and an accuracy of 80.97%,precision of 83.86%,and recall of 80.91%in the AD/EMCI/LMCI/NC task.This demonstrates that RepBoTNet-CESA delivers outstanding outcomes in various AD-aided diagnostic tasks.Furthermore,our study has shown that MHSA exhibits superior performance compared to conventional attention mechanisms in enhancing ResNet performance.Besides,the Deeper RepBoTNet-CESA network fails to make further progress in AD-aided diagnostic tasks.展开更多
基金Scientific Research Fund of Liaoning Provincial Education Department(No.JGLX2021030):Research on Vision-Based Intelligent Perception Technology for the Survival of Benthic Organisms.
文摘For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.
基金Science and Technology Research Project of the Henan Province(222102240014).
文摘Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.
基金The authors sincerely appreciate the valuable comments from the anonymous reviewers.The team of Jishunping from Wuhan University is acknowledged for supplying open-source remote sensing data.This research was supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2019QZKK0904)the National Natural Science Foundation of China(Grant No.U22A20597).
文摘Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid identification of landslides is important for disaster prevention and control;however,currently,landslide identification relies mainly on the manual interpretation of remote sensing images.Manual interpretation and feature recognition methods are time-consuming,labor-intensive,and challenging when confronted with complex scenarios.Consequently,automatic landslide recognition has emerged as a pivotal avenue for future development.In this study,a dataset comprising 2000 landslide images was constructed using open-source remote sensing images and datasets.The YOLOv7 model was enhanced using data augmentation algorithms and attention mechanisms.Three optimization models were formulated to realize automatic landslide recognition.The findings demonstrate the commendable performance of the optimized model in automatic landslide recognition,achieving a peak accuracy of 95.92%.Subsequently,the optimized model was applied to regional landslide identification,co-seismic landslide identification,and landslide recognition at various scales,all of which showed robust recognition capabilities.Nevertheless,the model exhibits limitations in detecting small targets,indicating areas for refining the deep-learning algorithms.The results of this research offer valuable technical support for the swift identification,prevention,and mitigation of landslide disasters.
基金Supported by the National Natural Science Foundation of China(No.61906066)Scientific Research Fund of Zhejiang Provincial Education Department(No.Y202147191)+2 种基金Huzhou University Graduate Research Innovation Project(No.2020KYCX21)Sanming Project of Medicine in Shenzhen(SZSM202311012)Shenzhen Science and Technology Program(No.JCYJ20220530153604010).
文摘AIM:To evaluate the application of an intelligent diagnostic model for pterygium.METHODS:For intelligent diagnosis of pterygium,the attention mechanisms—SENet,ECANet,CBAM,and Self-Attention—were fused with the lightweight MobileNetV2 model structure to construct a tri-classification model.The study used 1220 images of three types of anterior ocular segments of the pterygium provided by the Eye Hospital of Nanjing Medical University.Conventional classification models—VGG16,ResNet50,MobileNetV2,and EfficientNetB7—were trained on the same dataset for comparison.To evaluate model performance in terms of accuracy,Kappa value,test time,sensitivity,specificity,the area under curve(AUC),and visual heat map,470 test images of the anterior segment of the pterygium were used.RESULTS:The accuracy of the MobileNetV2+Self-Attention model with 281 MB in model size was 92.77%,and the Kappa value of the model was 88.92%.The testing time using the model was 9ms/image in the server and 138ms/image in the local computer.The sensitivity,specificity,and AUC for the diagnosis of pterygium using normal anterior segment images were 99.47%,100%,and 100%,respectively;using anterior segment images in the observation period were 88.30%,95.32%,and 96.70%,respectively;and using the anterior segment images in the surgery period were 88.18%,94.44%,and 97.30%,respectively.CONCLUSION:The developed model is lightweight and can be used not only for detection but also for assessing the severity of pterygium.
文摘With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-based,and machine learning-based estimation.Since the machine learning-based method can lead to better performance,in this paper,a deep learning-based load estimation algorithm using image fingerprint and attention mechanism is proposed.First,an image fingerprint construction is proposed for training data.After the data preprocessing,the training data matrix is constructed by the cyclic shift and cubic spline interpolation.Then,the linear mapping and the gray-color transformation method are proposed to form the color image fingerprint.Second,a convolutional neural network(CNN)combined with an attentionmechanism is proposed for training performance improvement.At last,an experiment is carried out to evaluate the estimation performance.Compared with the support vector machine method,CNN method and long short-term memory method,the proposed algorithm has the best load estimation performance.
基金supported by the National Key R&D Program of China(Grant Number 2021YFB2700900)the National Natural Science Foundation of China(Grant Numbers 62172232,62172233)the Jiangsu Basic Research Program Natural Science Foundation(Grant Number BK20200039).
文摘Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.
基金supported by the Natural Science Foundation of Shandong Province,China (Grant No. ZR2021MF049)Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)。
文摘Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.
基金Jilin Science and Technology Development Plan Project(No.20200403075SF)Doctoral Research Start-Up Fund of Northeast Electric Power University(No.BSJXM-2018202).
文摘The current existing problem of deep learning framework for the detection and segmentation of electrical equipment is dominantly related to low precision.Because of the reliable,safe and easy-to-operate technology provided by deep learning-based video surveillance for unmanned inspection of electrical equipment,this paper uses the bottleneck attention module(BAM)attention mechanism to improve the Solov2 model and proposes a new electrical equipment segmentation mode.Firstly,the BAM attention mechanism is integrated into the feature extraction network to adaptively learn the correlation between feature channels,thereby improving the expression ability of the feature map;secondly,the weighted sum of CrossEntropy Loss and Dice loss is designed as the mask loss to improve the segmentation accuracy and robustness of the model;finally,the non-maximal suppression(NMS)algorithm to better handle the overlap problem in instance segmentation.Experimental results show that the proposed method achieves an average segmentation accuracy of mAP of 80.4% on three types of electrical equipment datasets,including transformers,insulators and voltage transformers,which improve the detection accuracy by more than 5.7% compared with the original Solov2 model.The segmentation model proposed can provide a focusing technical means for the intelligent management of power systems.
基金supported by the National Natural Science Foundation of China under Grant 61702462the Henan Provincial Science and Technology Research Project under Grants 222102210010 and 222102210064+2 种基金the Research and Practice Project of Higher Education Teaching Reform in Henan Province under Grants 2019SJGLX320 and 2019SJGLX020the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant JiaoGao[2021]No.489-29the Academic Degrees&Graduate Education Reform Project of Henan Province under Grant 2021SJGLX115Y.
文摘Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fusion method does not utilize the correlation information between modalities.To solve this problem,this paper proposes amodel based on amulti-head attention mechanism.First,after preprocessing the original data.Then,the feature representation is converted into a sequence of word vectors and positional encoding is introduced to better understand the semantic and sequential information in the input sequence.Next,the input coding sequence is fed into the transformer model for further processing and learning.At the transformer layer,a cross-modal attention consisting of a pair of multi-head attention modules is employed to reflect the correlation between modalities.Finally,the processed results are input into the feedforward neural network to obtain the emotional output through the classification layer.Through the above processing flow,the model can capture semantic information and contextual relationships and achieve good results in various natural language processing tasks.Our model was tested on the CMU Multimodal Opinion Sentiment and Emotion Intensity(CMU-MOSEI)and Multimodal EmotionLines Dataset(MELD),achieving an accuracy of 82.04% and F1 parameters reached 80.59% on the former dataset.
文摘To improve the prediction accuracy of chaotic time series and reconstruct a more reasonable phase space structure of the prediction network,we propose a convolutional neural network-long short-term memory(CNN-LSTM)prediction model based on the incremental attention mechanism.Firstly,a traversal search is conducted through the traversal layer for finite parameters in the phase space.Then,an incremental attention layer is utilized for parameter judgment based on the dimension weight criteria(DWC).The phase space parameters that best meet DWC are selected and fed into the input layer.Finally,the constructed CNN-LSTM network extracts spatio-temporal features and provides the final prediction results.The model is verified using Logistic,Lorenz,and sunspot chaotic time series,and the performance is compared from the two dimensions of prediction accuracy and network phase space structure.Additionally,the CNN-LSTM network based on incremental attention is compared with long short-term memory(LSTM),convolutional neural network(CNN),recurrent neural network(RNN),and support vector regression(SVR)for prediction accuracy.The experiment results indicate that the proposed composite network model possesses enhanced capability in extracting temporal features and achieves higher prediction accuracy.Also,the algorithm to estimate the phase space parameter is compared with the traditional CAO,false nearest neighbor,and C-C,three typical methods for determining the chaotic phase space parameters.The experiments reveal that the phase space parameter estimation algorithm based on the incremental attention mechanism is superior in prediction accuracy compared with the traditional phase space reconstruction method in five networks,including CNN-LSTM,LSTM,CNN,RNN,and SVR.
基金supported by the National Natural Science Foundation of China(62201618).
文摘Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process is time-consuming.For 3D reconstruction data,this paper proposed a lightweight 3D noise reduction method for desktop-level Nano-CT called AAD-ResNet(Axial Attention DeNoise ResNet).The network is framed by theU-net structure.The encoder and decoder are incorporated with the proposed 3D axial attention mechanism and residual dense block.Each layer of the residual dense block can directly access the features of the previous layer,which reduces the redundancy of parameters and improves the efficiency of network training.The 3D axial attention mechanism enhances the correlation between 3D information in the training process and captures the long-distance dependence.It can improve the noise reduction effect and avoid the loss of image structure details.Experimental results show that the network can effectively improve the image quality of a 0.1-s exposure scan to a level close to a 3-s exposure,significantly shortening the sample scanning time.
基金Bai for their strong support for this work. This study was supported by the Natural Science Foundation of Henan Province (NO. 222301420113, 232102520006)Major Science and Technology Special Project of Henan Province (NO. 221100210600)+2 种基金Henan Province key research and development project (NO. 231111110100)Key Scientific and Technological Project of Henan Province (NO. 242102111193)the Natural Science Foundation of China(NO. 31501225, 42101362).
文摘Fusarium head blight (FHB) is one of the most destructive diseases in global wheat production. In order to count the FHB-infected wheat ears under field conditions, this study proposed an algorithm for diseased wheat ear detection based on improved YOLOv5s (Tr-YOLOv5s). The Swin Transformer was used to replace the CSPDarknet backbone network to enhance the extraction of characteristic information of the population wheat ears of FHB in the field background. The convolutional block attention module (CBAM) attention mechanism was added to improve the detection effect of target wheat ears, subsequently improving the overall accuracy of the model. The original loss function complete intersection over union (CIoU) was replaced by Scylla intersection over union (SIoU) loss to accelerate the model convergence and decrease the loss value. The results showed that the mean average precision (mAP) of the Tr-YOLOv5s model reached 90.64%, making a 4.63% improvement compared to the original YOLOv5s model. The improved model could quickly detect and count wheat FHB ear in the field environment, which laid a foundation for the subsequent automatic disease identification and grading of wheat FHB under field conditions.
基金Beijing Natural Science Foundation(No.IS23112)Beijing Institute of Technology Research Fund Program for Young Scholars(No.6120220236)。
文摘The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segmentation networks fail to extract features in fundus image sufficiently,we propose a novel network(DSeU-net)based on deformable convolution and squeeze excitation residual module.The deformable convolution is utilized to dynamically adjust the receptive field for the feature extraction of retinal vessel.And the squeeze excitation residual module is used to scale the weights of the low-level features so that the network learns the complex relationships of the different feature layers efficiently.We validate the DSeU-net on three public retinal vessel segmentation datasets including DRIVE,CHASEDB1,and STARE,and the experimental results demonstrate the satisfactory segmentation performance of the network.
基金Research Fund of Macao Polytechnic University(RP/FCSD-01/2022).
文摘Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive characteristics and superior soft tissue contrast.However,brain tumors are characterized by high non uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature.In addition,the labeling of tumor areas is time-consuming and laborious.Methods To address these issues,this study uses a residual grouped convolution module,convolutional block attention module,and bilinear interpolation upsampling method to improve the classical segmentation network U-net.The influence of network normalization,loss function,and network depth on segmentation performance is further considered.Results In the experiments,the Dice score of the proposed segmentation model reached 97.581%,which is 12.438%higher than that of traditional U-net,demonstrating the effective segmentation of MRI brain tumor images.Conclusions In conclusion,we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
文摘Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.
基金the Shanghai Rising-Star Program(No.22QA1403900)the National Natural Science Foundation of China(No.71804106)the Noncarbon Energy Conversion and Utilization Institute under the Shanghai Class IV Peak Disciplinary Development Program.
文摘Accurate load forecasting forms a crucial foundation for implementing household demand response plans andoptimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations,a single prediction model is hard to capture temporal features effectively, resulting in diminished predictionaccuracy. In this study, a hybrid deep learning framework that integrates attention mechanism, convolution neuralnetwork (CNN), improved chaotic particle swarm optimization (ICPSO), and long short-term memory (LSTM), isproposed for short-term household load forecasting. Firstly, the CNN model is employed to extract features fromthe original data, enhancing the quality of data features. Subsequently, the moving average method is used for datapreprocessing, followed by the application of the LSTM network to predict the processed data. Moreover, the ICPSOalgorithm is introduced to optimize the parameters of LSTM, aimed at boosting the model’s running speed andaccuracy. Finally, the attention mechanism is employed to optimize the output value of LSTM, effectively addressinginformation loss in LSTM induced by lengthy sequences and further elevating prediction accuracy. According tothe numerical analysis, the accuracy and effectiveness of the proposed hybrid model have been verified. It canexplore data features adeptly, achieving superior prediction accuracy compared to other forecasting methods forthe household load exhibiting significant fluctuations across different seasons.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金the Key Project of Zhejiang Provincial Natural Science Foundation under Grants LD21F020001,Z20F020022the National Natural Science Foundation of China under Grants 62072340,62076185the Major Project of Wenzhou Natural Science Foundation under Grants 2021HZSY0071,ZS2022001.
文摘Various deep learning models have been proposed for the accurate assisted diagnosis of early-stage Alzheimer’s disease(AD).Most studies predominantly employ Convolutional Neural Networks(CNNs),which focus solely on local features,thus encountering difficulties in handling global features.In contrast to natural images,Structural Magnetic Resonance Imaging(sMRI)images exhibit a higher number of channel dimensions.However,during the Position Embedding stage ofMulti Head Self Attention(MHSA),the coded information related to the channel dimension is disregarded.To tackle these issues,we propose theRepBoTNet-CESA network,an advanced AD-aided diagnostic model that is capable of learning local and global features simultaneously.It combines the advantages of CNN networks in capturing local information and Transformer networks in integrating global information,reducing computational costs while achieving excellent classification performance.Moreover,it uses the Cubic Embedding Self Attention(CESA)proposed in this paper to incorporate the channel code information,enhancing the classification performance within the Transformer structure.Finally,the RepBoTNet-CESA performs well in various AD-aided diagnosis tasks,with an accuracy of 96.58%,precision of 97.26%,and recall of 96.23%in the AD/NC task;an accuracy of 92.75%,precision of 92.84%,and recall of 93.18%in the EMCI/NC task;and an accuracy of 80.97%,precision of 83.86%,and recall of 80.91%in the AD/EMCI/LMCI/NC task.This demonstrates that RepBoTNet-CESA delivers outstanding outcomes in various AD-aided diagnostic tasks.Furthermore,our study has shown that MHSA exhibits superior performance compared to conventional attention mechanisms in enhancing ResNet performance.Besides,the Deeper RepBoTNet-CESA network fails to make further progress in AD-aided diagnostic tasks.