The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-atten...The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.展开更多
提出一种基于SABO-GRU-Attention(subtraction average based optimizer-gate recurrent unitattention)的锂电池SOC(state of charge)估计方法。采用基于平均减法优化算法自适应更新GRU神经网络的超参数,融合SE(squeeze and excitation...提出一种基于SABO-GRU-Attention(subtraction average based optimizer-gate recurrent unitattention)的锂电池SOC(state of charge)估计方法。采用基于平均减法优化算法自适应更新GRU神经网络的超参数,融合SE(squeeze and excitation)注意力机制自适应分配各通道权重,提高学习效率。对马里兰大学电池数据集进行预处理,输入电压、电流参数,进行锂电池充放电仿真实验,并搭建锂电池荷电状态实验平台进行储能锂电池充放电实验。结果表明,提出的SOC神经网络估计模型明显优于LSTM、GRU以及PSO-GRU等模型,具有较高的估计精度与应用价值。展开更多
Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images wit...Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time.展开更多
Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior know...Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior knowledge of illumination patterns and signal-to-noise ratio(SNR)of raw images.To obtain high-quality SR images,several raw images need to be captured under high fluorescence level,which further restricts SIM’s temporal resolution and its applications.Deep learning(DL)is a data-driven technology that has been used to expand the limits of optical microscopy.In this study,we propose a deep neural network based on multi-level wavelet and attention mechanism(MWAM)for SIM.Our results show that the MWAM network can extract high-frequency information contained in SIM raw images and accurately integrate it into the output image,resulting in superior SR images compared to those generated using wide-field images as input data.We also demonstrate that the number of SIM raw images can be reduced to three,with one image in each illumination orientation,to achieve the optimal tradeoff between temporal and spatial resolution.Furthermore,our MWAM network exhibits superior reconstruction ability on low-SNR images compared to conventional SIM algorithms.We have also analyzed the adaptability of this network on other biological samples and successfully applied the pretrained model to other SIM systems.展开更多
Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susce...Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.展开更多
The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conven...The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.展开更多
BACKGROUND Attention deficit hyperactivity disorder(ADHD)is a common mental and behavioral disorder among children.AIM To explore the focus of attention deficit hyperactivity disorder parents and the effectiveness of ...BACKGROUND Attention deficit hyperactivity disorder(ADHD)is a common mental and behavioral disorder among children.AIM To explore the focus of attention deficit hyperactivity disorder parents and the effectiveness of early clinical screening METHODS This study found that the main directions of parents seeking medical help were short attention time for children under 7 years old(16.6%)and poor academic performance for children over 7 years old(12.1%).We employed a two-stage experiment to diagnose ADHD.Among the 5683 children evaluated from 2018 to 2021,360 met the DSM-5 criteria.Those diagnosed with ADHD underwent assessments for letter,number,and figure attention.Following the exclusion of ADHD-H diagnoses,the detection rate rose to 96.0%,with 310 out of 323 cases identified.RESULTS This study yielded insights into the primary concerns of parents regarding their children's symptoms and validated the efficacy of a straightforward diagnostic test,offering valuable guidance for directing ADHD treatment,facilitating early detection,and enabling timely intervention.Our research delved into the predominant worries of parents across various age groups.Furthermore,we showcased the precision of the simple exclusion experiment in discerning between ADHD-I and ADHD-C in children.CONCLUSION Our study will help diagnose and guide future treatment directions for ADHD.展开更多
The fluctuation of wind power affects the operating safety and power consumption of the electric power grid and restricts the grid connection of wind power on a large scale.Therefore,wind power forecasting plays a key...The fluctuation of wind power affects the operating safety and power consumption of the electric power grid and restricts the grid connection of wind power on a large scale.Therefore,wind power forecasting plays a key role in improving the safety and economic benefits of the power grid.This paper proposes a wind power predicting method based on a convolutional graph attention deep neural network with multi-wind farm data.Based on the graph attention network and attention mechanism,the method extracts spatial-temporal characteristics from the data of multiple wind farms.Then,combined with a deep neural network,a convolutional graph attention deep neural network model is constructed.Finally,the model is trained with the quantile regression loss function to achieve the wind power deterministic and probabilistic prediction based on multi-wind farm spatial-temporal data.A wind power dataset in the U.S.is taken as an example to demonstrate the efficacy of the proposed model.Compared with the selected baseline methods,the proposed model achieves the best prediction performance.The point prediction errors(i.e.,root mean square error(RMSE)and normalized mean absolute percentage error(NMAPE))are 0.304 MW and 1.177%,respectively.And the comprehensive performance of probabilistic prediction(i.e.,con-tinuously ranked probability score(CRPS))is 0.580.Thus,the significance of multi-wind farm data and spatial-temporal feature extraction module is self-evident.展开更多
The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requ...The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 4) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.展开更多
针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进...针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.展开更多
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i...Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis.展开更多
The attention is a scarce resource in decentralized autonomous organizations(DAOs),as their self-governance relies heavily on the attention-intensive decision-making process of“proposal and voting”.To prevent the ne...The attention is a scarce resource in decentralized autonomous organizations(DAOs),as their self-governance relies heavily on the attention-intensive decision-making process of“proposal and voting”.To prevent the negative effects of pro-posers’attention-capturing strategies that contribute to the“tragedy of the commons”and ensure an efficient distribution of attention among multiple proposals,it is necessary to establish a market-driven allocation scheme for DAOs’attention.First,the Harberger tax-based attention markets are designed to facilitate its allocation via continuous and automated trading,where the individualized Harberger tax rate(HTR)determined by the pro-posers’reputation is adopted.Then,the Stackelberg game model is formulated in these markets,casting attention to owners in the role of leaders and other competitive proposers as followers.Its equilibrium trading strategies are also discussed to unravel the intricate dynamics of attention pricing.Moreover,utilizing the single-round Stackelberg game as an illustrative example,the existence of Nash equilibrium trading strategies is demonstrated.Finally,the impact of individualized HTR on trading strategies is investigated,and results suggest that it has a negative correlation with leaders’self-accessed prices and ownership duration,but its effect on their revenues varies under different conditions.This study is expected to provide valuable insights into leveraging attention resources to improve DAOs’governance and decision-making process.展开更多
How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is pro...How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is proposed in this paper.The architecture of the attention-relation network contains two modules:a feature extract module and a feature metric module.Different from other few-shot models,an attention mechanism is applied to metric learning in our model to measure the distance between features,so as to pay attention to the correlation between features and suppress unwanted information.Besides,we combine dilated convolution and skip connection to extract more feature information for follow-up processing.We validate attention-relation network on the mobile phone screen defect dataset.The experimental results show that the classification accuracy of the attentionrelation network is 0.9486 under the 5-way 1-shot training strategy and 0.9039 under the 5-way 5-shot setting.It achieves the excellent effect of classification for mobile phone screen defects and outperforms with dominant advantages.展开更多
Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional ...Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional attention schemes have not considered the impact of lesion type differences on grading,resulting in unreasonable extraction of important lesion features.Therefore,this paper proposes a DR diagnosis scheme that integrates a multi-level patch attention generator(MPAG)and a lesion localization module(LLM).Firstly,MPAGis used to predict patches of different sizes and generate a weighted attention map based on the prediction score and the types of lesions contained in the patches,fully considering the impact of lesion type differences on grading,solving the problem that the attention maps of lesions cannot be further refined and then adapted to the final DR diagnosis task.Secondly,the LLM generates a global attention map based on localization.Finally,the weighted attention map and global attention map are weighted with the fundus map to fully explore effective DR lesion information and increase the attention of the classification network to lesion details.This paper demonstrates the effectiveness of the proposed method through extensive experiments on the public DDR dataset,obtaining an accuracy of 0.8064.展开更多
For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,whic...For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.展开更多
Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life s...Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.展开更多
基金the Communication University of China(CUC230A013)the Fundamental Research Funds for the Central Universities.
文摘The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.
文摘提出一种基于SABO-GRU-Attention(subtraction average based optimizer-gate recurrent unitattention)的锂电池SOC(state of charge)估计方法。采用基于平均减法优化算法自适应更新GRU神经网络的超参数,融合SE(squeeze and excitation)注意力机制自适应分配各通道权重,提高学习效率。对马里兰大学电池数据集进行预处理,输入电压、电流参数,进行锂电池充放电仿真实验,并搭建锂电池荷电状态实验平台进行储能锂电池充放电实验。结果表明,提出的SOC神经网络估计模型明显优于LSTM、GRU以及PSO-GRU等模型,具有较高的估计精度与应用价值。
基金supported in part by the National Natural Science Foundation of China under Grant 62062061/in part by the Major Project Cultivation Fund of Xizang Minzu University under Grant 324112300447.
文摘Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time.
基金supported by the National Natural Science Foundation of China(Grant Nos.62005307 and 61975228).
文摘Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior knowledge of illumination patterns and signal-to-noise ratio(SNR)of raw images.To obtain high-quality SR images,several raw images need to be captured under high fluorescence level,which further restricts SIM’s temporal resolution and its applications.Deep learning(DL)is a data-driven technology that has been used to expand the limits of optical microscopy.In this study,we propose a deep neural network based on multi-level wavelet and attention mechanism(MWAM)for SIM.Our results show that the MWAM network can extract high-frequency information contained in SIM raw images and accurately integrate it into the output image,resulting in superior SR images compared to those generated using wide-field images as input data.We also demonstrate that the number of SIM raw images can be reduced to three,with one image in each illumination orientation,to achieve the optimal tradeoff between temporal and spatial resolution.Furthermore,our MWAM network exhibits superior reconstruction ability on low-SNR images compared to conventional SIM algorithms.We have also analyzed the adaptability of this network on other biological samples and successfully applied the pretrained model to other SIM systems.
基金funded by Yayasan UTP FRG(YUTP-FRG),grant number 015LC0-280 and Computer and Information Science Department of Universiti Teknologi PETRONAS.
文摘Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.
基金supported in part by the Science and Technology Innovation Project of CHN Energy Shuo Huang Railway Development Company Ltd(No.SHTL-22-28)the Beijing Natural Science Foundation Fengtai Urban Rail Transit Frontier Research Joint Fund(No.L231002)the Major Project of China State Railway Group Co.,Ltd.(No.K2023T003)。
文摘The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.
文摘BACKGROUND Attention deficit hyperactivity disorder(ADHD)is a common mental and behavioral disorder among children.AIM To explore the focus of attention deficit hyperactivity disorder parents and the effectiveness of early clinical screening METHODS This study found that the main directions of parents seeking medical help were short attention time for children under 7 years old(16.6%)and poor academic performance for children over 7 years old(12.1%).We employed a two-stage experiment to diagnose ADHD.Among the 5683 children evaluated from 2018 to 2021,360 met the DSM-5 criteria.Those diagnosed with ADHD underwent assessments for letter,number,and figure attention.Following the exclusion of ADHD-H diagnoses,the detection rate rose to 96.0%,with 310 out of 323 cases identified.RESULTS This study yielded insights into the primary concerns of parents regarding their children's symptoms and validated the efficacy of a straightforward diagnostic test,offering valuable guidance for directing ADHD treatment,facilitating early detection,and enabling timely intervention.Our research delved into the predominant worries of parents across various age groups.Furthermore,we showcased the precision of the simple exclusion experiment in discerning between ADHD-I and ADHD-C in children.CONCLUSION Our study will help diagnose and guide future treatment directions for ADHD.
基金supported by the Science and Technology Project of State Grid Corporation of China(4000-202122070A-0-0-00).
文摘The fluctuation of wind power affects the operating safety and power consumption of the electric power grid and restricts the grid connection of wind power on a large scale.Therefore,wind power forecasting plays a key role in improving the safety and economic benefits of the power grid.This paper proposes a wind power predicting method based on a convolutional graph attention deep neural network with multi-wind farm data.Based on the graph attention network and attention mechanism,the method extracts spatial-temporal characteristics from the data of multiple wind farms.Then,combined with a deep neural network,a convolutional graph attention deep neural network model is constructed.Finally,the model is trained with the quantile regression loss function to achieve the wind power deterministic and probabilistic prediction based on multi-wind farm spatial-temporal data.A wind power dataset in the U.S.is taken as an example to demonstrate the efficacy of the proposed model.Compared with the selected baseline methods,the proposed model achieves the best prediction performance.The point prediction errors(i.e.,root mean square error(RMSE)and normalized mean absolute percentage error(NMAPE))are 0.304 MW and 1.177%,respectively.And the comprehensive performance of probabilistic prediction(i.e.,con-tinuously ranked probability score(CRPS))is 0.580.Thus,the significance of multi-wind farm data and spatial-temporal feature extraction module is self-evident.
文摘The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 4) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.
文摘针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.
文摘Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis.
基金supported by the National Natural Science Foundation of China(62103411)the Science and Technology Development Fund of Macao SAR(0093/2023/RIA2,0050/2020/A1)。
文摘The attention is a scarce resource in decentralized autonomous organizations(DAOs),as their self-governance relies heavily on the attention-intensive decision-making process of“proposal and voting”.To prevent the negative effects of pro-posers’attention-capturing strategies that contribute to the“tragedy of the commons”and ensure an efficient distribution of attention among multiple proposals,it is necessary to establish a market-driven allocation scheme for DAOs’attention.First,the Harberger tax-based attention markets are designed to facilitate its allocation via continuous and automated trading,where the individualized Harberger tax rate(HTR)determined by the pro-posers’reputation is adopted.Then,the Stackelberg game model is formulated in these markets,casting attention to owners in the role of leaders and other competitive proposers as followers.Its equilibrium trading strategies are also discussed to unravel the intricate dynamics of attention pricing.Moreover,utilizing the single-round Stackelberg game as an illustrative example,the existence of Nash equilibrium trading strategies is demonstrated.Finally,the impact of individualized HTR on trading strategies is investigated,and results suggest that it has a negative correlation with leaders’self-accessed prices and ownership duration,but its effect on their revenues varies under different conditions.This study is expected to provide valuable insights into leveraging attention resources to improve DAOs’governance and decision-making process.
文摘How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is proposed in this paper.The architecture of the attention-relation network contains two modules:a feature extract module and a feature metric module.Different from other few-shot models,an attention mechanism is applied to metric learning in our model to measure the distance between features,so as to pay attention to the correlation between features and suppress unwanted information.Besides,we combine dilated convolution and skip connection to extract more feature information for follow-up processing.We validate attention-relation network on the mobile phone screen defect dataset.The experimental results show that the classification accuracy of the attentionrelation network is 0.9486 under the 5-way 1-shot training strategy and 0.9039 under the 5-way 5-shot setting.It achieves the excellent effect of classification for mobile phone screen defects and outperforms with dominant advantages.
基金supported in part by the Research on the Application of Multimodal Artificial Intelligence in Diagnosis and Treatment of Type 2 Diabetes under Grant No.2020SK50910in part by the Hunan Provincial Natural Science Foundation of China under Grant 2023JJ60020.
文摘Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional attention schemes have not considered the impact of lesion type differences on grading,resulting in unreasonable extraction of important lesion features.Therefore,this paper proposes a DR diagnosis scheme that integrates a multi-level patch attention generator(MPAG)and a lesion localization module(LLM).Firstly,MPAGis used to predict patches of different sizes and generate a weighted attention map based on the prediction score and the types of lesions contained in the patches,fully considering the impact of lesion type differences on grading,solving the problem that the attention maps of lesions cannot be further refined and then adapted to the final DR diagnosis task.Secondly,the LLM generates a global attention map based on localization.Finally,the weighted attention map and global attention map are weighted with the fundus map to fully explore effective DR lesion information and increase the attention of the classification network to lesion details.This paper demonstrates the effectiveness of the proposed method through extensive experiments on the public DDR dataset,obtaining an accuracy of 0.8064.
基金Scientific Research Fund of Liaoning Provincial Education Department(No.JGLX2021030):Research on Vision-Based Intelligent Perception Technology for the Survival of Benthic Organisms.
文摘For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.
基金Science and Technology Research Project of the Henan Province(222102240014).
文摘Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.