Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from ima...Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.展开更多
Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often...Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.展开更多
Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low a...Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.展开更多
Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variati...Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.展开更多
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requ...The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 4) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.展开更多
Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD)...Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.展开更多
Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale regi...Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.展开更多
Similarity relation is one of the spatial relations in the community of geographic information science and cartography.It is widely used in the retrieval of spatial databases, the recognition of spatial objects from i...Similarity relation is one of the spatial relations in the community of geographic information science and cartography.It is widely used in the retrieval of spatial databases, the recognition of spatial objects from images, and the description of spatial features on maps.However, little achievements have been made for it by far.In this paper, spatial similarity relation was put forward with the introduction of automated map generalization in the construction of multi-scale map databases;then the definition of spatial similarity relations was presented based on set theory, the concept of spatial similarity degree was given, and the characteristics of spatial similarity were discussed in detail, in-cluding reflexivity, symmetry, non-transitivity, self-similarity in multi-scale spaces, and scale-dependence.Finally a classification system for spatial similarity relations in multi-scale map spaces was addressed.This research may be useful to automated map generalization, spatial similarity retrieval and spatial reasoning.展开更多
Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images wit...Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time.展开更多
Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susce...Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.展开更多
The degree of spatial similarity plays an important role in map generalization, yet there has been no quantitative research into it. To fill this gap, this study first defines map scale change and spatial similarity d...The degree of spatial similarity plays an important role in map generalization, yet there has been no quantitative research into it. To fill this gap, this study first defines map scale change and spatial similarity degree/relation in multi-scale map spaces and then proposes a model for calculating the degree of spatial similarity between a point cloud at one scale and its gener- alized counterpart at another scale. After validation, the new model features 16 points with map scale change as the x coordinate and the degree of spatial similarity as the y coordinate. Finally, using an application for curve fitting, the model achieves an empirical formula that can calculate the degree of spatial similarity using map scale change as the sole independent variable, and vice versa. This formula can be used to automate algorithms for point feature generalization and to determine when to terminate them during the generalization.展开更多
Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false...Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.展开更多
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta...The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.展开更多
The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conven...The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an...The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.展开更多
In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract i...In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.展开更多
Imaging quality is a critical component of compressive imaging in real applications. In this study, we propose a compressive imaging method based on multi-scale modulation and reconstruction in the spatial frequency d...Imaging quality is a critical component of compressive imaging in real applications. In this study, we propose a compressive imaging method based on multi-scale modulation and reconstruction in the spatial frequency domain. Theoretical analysis and simulation show the relation between the measurement matrix resolution and compressive sensing(CS)imaging quality. The matrix design is improved to provide multi-scale modulations, followed by individual reconstruction of images of different spatial frequencies. Compared with traditional single-scale CS imaging, the multi-scale method provides high quality imaging in both high and low frequencies, and effectively decreases the overall reconstruction error.Experimental results confirm the feasibility of this technique, especially at low sampling rate. The method may thus be helpful in promoting the implementation of compressive imaging in real applications.展开更多
The relationships between soil total nitrogen(STN)and influencing factors are scale-dependent.The objective of this study was to identify the multi-scale spatial relationships of STN with selected environmental factor...The relationships between soil total nitrogen(STN)and influencing factors are scale-dependent.The objective of this study was to identify the multi-scale spatial relationships of STN with selected environmental factors(elevation,slope and topographic wetness index),intrinsic soil factors(soil bulk density,sand content,silt content,and clay content)and combined environmental factors(including the first two principal components(PC1 and PC2)of the Vis-NIR soil spectra)along three sampling transects located at the upstream,midstream and downstream of Taiyuan Basin on the Chinese Loess Plateau.We separated the multivariate data series of STN and influencing factors at each transect into six intrinsic mode functions(IMFs)and one residue by multivariate empirical mode decomposition(MEMD).Meanwhile,we obtained the predicted equations of STN based on MEMD by stepwise multiple linear regression(SMLR).The results indicated that the dominant scales of explained variance in STN were at scale 995 m for transect 1,at scales 956 and 8852 m for transect 2,and at scales 972,5716 and 12,317 m for transect 3.Multi-scale correlation coefficients between STN and influencing factors were less significant in transect 3 than in transects 1 and 2.The goodness of fit root mean square error(RMSE),normalized root mean square error(NRMSE),and coefficient of determination(R2)indicated that the prediction of STN at the sampling scale by summing all of the predicted IMFs and residue was more accurate than that by SMLR directly.Therefore,the multi-scale method of MEMD has a good potential in characterizing the multi-scale spatial relationships between STN and influencing factors at the basin landscape scale.展开更多
基金supported in part by the General Program Hunan Provincial Natural Science Foundation of 2022,China(2022JJ31022)the Undergraduate Education Reform Project of Hunan Province,China(HNJG-20210532)the National Natural Science Foundation of China(62276276)。
文摘Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.
基金This research was supported by the National Natural Science Foundation of China No.62276086the National Key R&D Program of China No.2022YFD2000100Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGN23D010002.
文摘Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.
基金funded by the National Natural Foundation of China under Grant No.61172167the Science Fund Project of Heilongjiang Province(LH2020F035).
文摘Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.
基金the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Accurately identifying small objects in high-resolution aerial images presents a complex and crucial task in thefield of small object detection on unmanned aerial vehicles(UAVs).This task is challenging due to variations inUAV flight altitude,differences in object scales,as well as factors like flight speed and motion blur.To enhancethe detection efficacy of small targets in drone aerial imagery,we propose an enhanced You Only Look Onceversion 7(YOLOv7)algorithm based on multi-scale spatial context.We build the MSC-YOLO model,whichincorporates an additional prediction head,denoted as P2,to improve adaptability for small objects.We replaceconventional downsampling with a Spatial-to-Depth Convolutional Combination(CSPDC)module to mitigatethe loss of intricate feature details related to small objects.Furthermore,we propose a Spatial Context Pyramidwith Multi-Scale Attention(SCPMA)module,which captures spatial and channel-dependent features of smalltargets acrossmultiple scales.This module enhances the perception of spatial contextual features and the utilizationof multiscale feature information.On the Visdrone2023 and UAVDT datasets,MSC-YOLO achieves remarkableresults,outperforming the baseline method YOLOv7 by 3.0%in terms ofmean average precision(mAP).The MSCYOLOalgorithm proposed in this paper has demonstrated satisfactory performance in detecting small targets inUAV aerial photography,providing strong support for practical applications.
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
文摘The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 4) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.
基金This work was supported by the National Natural Science Foundation of China(No.61906006).
文摘Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.
基金Supported by the National Natural Science Foundation of China(61903336,61976190)the Natural Science Foundation of Zhejiang Province(LY21F030015)。
文摘Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.
文摘Similarity relation is one of the spatial relations in the community of geographic information science and cartography.It is widely used in the retrieval of spatial databases, the recognition of spatial objects from images, and the description of spatial features on maps.However, little achievements have been made for it by far.In this paper, spatial similarity relation was put forward with the introduction of automated map generalization in the construction of multi-scale map databases;then the definition of spatial similarity relations was presented based on set theory, the concept of spatial similarity degree was given, and the characteristics of spatial similarity were discussed in detail, in-cluding reflexivity, symmetry, non-transitivity, self-similarity in multi-scale spaces, and scale-dependence.Finally a classification system for spatial similarity relations in multi-scale map spaces was addressed.This research may be useful to automated map generalization, spatial similarity retrieval and spatial reasoning.
基金supported in part by the National Natural Science Foundation of China under Grant 62062061/in part by the Major Project Cultivation Fund of Xizang Minzu University under Grant 324112300447.
文摘Recently,deep learning-based image inpainting methods have made great strides in reconstructing damaged regions.However,these methods often struggle to produce satisfactory results when dealing with missing images with large holes,leading to distortions in the structure and blurring of textures.To address these problems,we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms.The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details.This method divides the inpainting task into two phases:edge prediction and image inpainting.Specifically,in the edge prediction phase,a transformer architecture is designed to combine axial attention with standard self-attention.This design enhances the extraction capability of global structural features and location awareness.It also balances the complexity of self-attention operations,resulting in accurate prediction of the edge structure in the defective region.In the image inpainting phase,a multi-scale fusion attention module is introduced.This module makes full use of multi-level distant features and enhances local pixel continuity,thereby significantly improving the quality of image inpainting.To evaluate the performance of our method.comparative experiments are conducted on several datasets,including CelebA,Places2,and Facade.Quantitative experiments show that our method outperforms the other mainstream methods.Specifically,it improves Peak Signal-to-Noise Ratio(PSNR)and Structure Similarity Index Measure(SSIM)by 1.141~3.234 db and 0.083~0.235,respectively.Moreover,it reduces Learning Perceptual Image Patch Similarity(LPIPS)and Mean Absolute Error(MAE)by 0.0347~0.1753 and 0.0104~0.0402,respectively.Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details.Furthermore,our model exhibits impressive performance in terms of the number of parameters,memory cost,and testing time.
基金funded by Yayasan UTP FRG(YUTP-FRG),grant number 015LC0-280 and Computer and Information Science Department of Universiti Teknologi PETRONAS.
文摘Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.
基金funded by the Natural Science Foundation Committee,China(41364001,41371435)
文摘The degree of spatial similarity plays an important role in map generalization, yet there has been no quantitative research into it. To fill this gap, this study first defines map scale change and spatial similarity degree/relation in multi-scale map spaces and then proposes a model for calculating the degree of spatial similarity between a point cloud at one scale and its gener- alized counterpart at another scale. After validation, the new model features 16 points with map scale change as the x coordinate and the degree of spatial similarity as the y coordinate. Finally, using an application for curve fitting, the model achieves an empirical formula that can calculate the degree of spatial similarity using map scale change as the sole independent variable, and vice versa. This formula can be used to automate algorithms for point feature generalization and to determine when to terminate them during the generalization.
基金the Scientific Research Fund of Hunan Provincial Education Department(23A0423).
文摘Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.
基金This work was supported in part by the National Natural Science Foundation of China(Grant#:82260362)in part by the National Key R&D Program of China(Grant#:2021ZD0111000)+1 种基金in part by the Key R&D Project of Hainan Province(Grant#:ZDYF2021SHFZ243)in part by the Major Science and Technology Project of Haikou(Grant#:2020-009).
文摘The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.
基金supported in part by the Science and Technology Innovation Project of CHN Energy Shuo Huang Railway Development Company Ltd(No.SHTL-22-28)the Beijing Natural Science Foundation Fengtai Urban Rail Transit Frontier Research Joint Fund(No.L231002)the Major Project of China State Railway Group Co.,Ltd.(No.K2023T003)。
文摘The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金Supported by the National Natural Science Foundation of China(62072334).
文摘The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.
基金the Scientific Research Foundation of Liaoning Provincial Department of Education(No.LJKZ0139)the Program for Liaoning Excellent Talents in University(No.LR15045).
文摘In order to improve the models capability in expressing features during few-shot learning,a multi-scale features prototypical network(MS-PN)algorithm is proposed.The metric learning algo-rithm is employed to extract image features and project them into a feature space,thus evaluating the similarity between samples based on their relative distances within the metric space.To sufficiently extract feature information from limited sample data and mitigate the impact of constrained data vol-ume,a multi-scale feature extraction network is presented to capture data features at various scales during the process of image feature extraction.Additionally,the position of the prototype is fine-tuned by assigning weights to data points to mitigate the influence of outliers on the experiment.The loss function integrates contrastive loss and label-smoothing to bring similar data points closer and separate dissimilar data points within the metric space.Experimental evaluations are conducted on small-sample datasets mini-ImageNet and CUB200-2011.The method in this paper can achieve higher classification accuracy.Specifically,in the 5-way 1-shot experiment,classification accuracy reaches 50.13%and 66.79%respectively on these two datasets.Moreover,in the 5-way 5-shot ex-periment,accuracy of 66.79%and 85.91%are observed,respectively.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61601442,61605218,and 61575207)the National Key Research and Development Program of China(Grant No.2018YFB0504302)the Youth Innovation Promotion Association of the Chinese Academy of Sciences(Grant Nos.2015124 and 2019154)。
文摘Imaging quality is a critical component of compressive imaging in real applications. In this study, we propose a compressive imaging method based on multi-scale modulation and reconstruction in the spatial frequency domain. Theoretical analysis and simulation show the relation between the measurement matrix resolution and compressive sensing(CS)imaging quality. The matrix design is improved to provide multi-scale modulations, followed by individual reconstruction of images of different spatial frequencies. Compared with traditional single-scale CS imaging, the multi-scale method provides high quality imaging in both high and low frequencies, and effectively decreases the overall reconstruction error.Experimental results confirm the feasibility of this technique, especially at low sampling rate. The method may thus be helpful in promoting the implementation of compressive imaging in real applications.
基金financially supported by the Research Project of Shanxi Scholarship Council of China (2017– 075)the Natural Science foundation for Young Scientists of Shanxi Province (201801D221103)the Innovation Grant of Shanxi Agricultural University (2017ZZ07)
文摘The relationships between soil total nitrogen(STN)and influencing factors are scale-dependent.The objective of this study was to identify the multi-scale spatial relationships of STN with selected environmental factors(elevation,slope and topographic wetness index),intrinsic soil factors(soil bulk density,sand content,silt content,and clay content)and combined environmental factors(including the first two principal components(PC1 and PC2)of the Vis-NIR soil spectra)along three sampling transects located at the upstream,midstream and downstream of Taiyuan Basin on the Chinese Loess Plateau.We separated the multivariate data series of STN and influencing factors at each transect into six intrinsic mode functions(IMFs)and one residue by multivariate empirical mode decomposition(MEMD).Meanwhile,we obtained the predicted equations of STN based on MEMD by stepwise multiple linear regression(SMLR).The results indicated that the dominant scales of explained variance in STN were at scale 995 m for transect 1,at scales 956 and 8852 m for transect 2,and at scales 972,5716 and 12,317 m for transect 3.Multi-scale correlation coefficients between STN and influencing factors were less significant in transect 3 than in transects 1 and 2.The goodness of fit root mean square error(RMSE),normalized root mean square error(NRMSE),and coefficient of determination(R2)indicated that the prediction of STN at the sampling scale by summing all of the predicted IMFs and residue was more accurate than that by SMLR directly.Therefore,the multi-scale method of MEMD has a good potential in characterizing the multi-scale spatial relationships between STN and influencing factors at the basin landscape scale.