Object segmentation and recognition is an imperative area of computer vision andmachine learning that identifies and separates individual objects within an image or video and determines classes or categories based on ...Object segmentation and recognition is an imperative area of computer vision andmachine learning that identifies and separates individual objects within an image or video and determines classes or categories based on their features.The proposed system presents a distinctive approach to object segmentation and recognition using Artificial Neural Networks(ANNs).The system takes RGB images as input and uses a k-means clustering-based segmentation technique to fragment the intended parts of the images into different regions and label thembased on their characteristics.Then,two distinct kinds of features are obtained from the segmented images to help identify the objects of interest.An Artificial Neural Network(ANN)is then used to recognize the objects based on their features.Experiments were carried out with three standard datasets,MSRC,MS COCO,and Caltech 101 which are extensively used in object recognition research,to measure the productivity of the suggested approach.The findings from the experiment support the suggested system’s validity,as it achieved class recognition accuracies of 89%,83%,and 90.30% on the MSRC,MS COCO,and Caltech 101 datasets,respectively.展开更多
At present,the emerging solid-phase friction-based additive manufacturing technology,including friction rolling additive man-ufacturing(FRAM),can only manufacture simple single-pass components.In this study,multi-laye...At present,the emerging solid-phase friction-based additive manufacturing technology,including friction rolling additive man-ufacturing(FRAM),can only manufacture simple single-pass components.In this study,multi-layer multi-pass FRAM-deposited alumin-um alloy samples were successfully prepared using a non-shoulder tool head.The material flow behavior and microstructure of the over-lapped zone between adjacent layers and passes during multi-layer multi-pass FRAM deposition were studied using the hybrid 6061 and 5052 aluminum alloys.The results showed that a mechanical interlocking structure was formed between the adjacent layers and the adja-cent passes in the overlapped center area.Repeated friction and rolling of the tool head led to different degrees of lateral flow and plastic deformation of the materials in the overlapped zone,which made the recrystallization degree in the left and right edge zones of the over-lapped zone the highest,followed by the overlapped center zone and the non-overlapped zone.The tensile strength of the overlapped zone exceeded 90%of that of the single-pass deposition sample.It is proved that although there are uneven grooves on the surface of the over-lapping area during multi-layer and multi-pass deposition,they can be filled by the flow of materials during the deposition of the next lay-er,thus ensuring the dense microstructure and excellent mechanical properties of the overlapping area.The multi-layer multi-pass FRAM deposition overcomes the limitation of deposition width and lays the foundation for the future deposition of large-scale high-performance components.展开更多
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ...A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.展开更多
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi...In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been pr...Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time.展开更多
Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vis...Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution.展开更多
In recent years,skeleton-based action recognition has made great achievements in Computer Vision.A graph convolutional network(GCN)is effective for action recognition,modelling the human skeleton as a spatio-temporal ...In recent years,skeleton-based action recognition has made great achievements in Computer Vision.A graph convolutional network(GCN)is effective for action recognition,modelling the human skeleton as a spatio-temporal graph.Most GCNs define the graph topology by physical relations of the human joints.However,this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs,resulting in a low recognition rate for specific actions with implicit correlation between joint pairs.In addition,existing methods ignore the trend correlation between adjacent frames within an action and context clues,leading to erroneous action recognition with similar poses.Therefore,this study proposes a learnable GCN based on behavior dependence,which considers implicit joint correlation by constructing a dynamic learnable graph with extraction of specific behavior dependence of joint pairs.By using the weight relationship between the joint pairs,an adaptive model is constructed.It also designs a self-attention module to obtain their inter-frame topological relationship for exploring the context of actions.Combining the shared topology and the multi-head self-attention map,the module obtains the context-based clue topology to update the dynamic graph convolution,achieving accurate recognition of different actions with similar poses.Detailed experiments on public datasets demonstrate that the proposed method achieves better results and realizes higher quality representation of actions under various evaluation protocols compared to state-of-the-art methods.展开更多
Marine umbilical is one of the key equipment for subsea oil and gas exploitation,which is usually integrated by a great number of different functional components with multi-layers.The layout of these components direct...Marine umbilical is one of the key equipment for subsea oil and gas exploitation,which is usually integrated by a great number of different functional components with multi-layers.The layout of these components directly affects manufacturing,operation and storage performances of the umbilical.For the multi-layer cross-sectional layout design of the umbilical,a quantifiable multi-objective optimization model is established according to the operation and storage requirements.Considering the manufacturing factors,the multi-layering strategy based on contact point identification is introduced for a great number of functional components.Then,the GA-GLM global optimization algorithm is proposed combining the genetic algorithm and the generalized multiplier method,and the selection operator of the genetic algorithm is improved based on the steepest descent method.Genetic algorithm is used to find the optimal solution in the global space,which can converge from any initial layout to the feasible layout solution.The feasible layout solution is taken as the initial value of the generalized multiplier method for fast and accurate solution.Finally,taking umbilicals with a great number of components as examples,the results show that the cross-sectional performance of the umbilical obtained by optimization algorithm is better and the solution efficiency is higher.Meanwhile,the multi-layering strategy is effective and feasible.The design method proposed in this paper can quickly obtain the optimal multi-layer cross-sectional layout,which replaces the manual design,and provides useful reference and guidance for the umbilical industry.展开更多
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i...Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis.展开更多
Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this uniq...Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence.展开更多
One objective of developing machine learning(ML)-based material models is to integrate them with well-established numerical methods to solve boundary value problems(BVPs).In the family of ML models,recurrent neural ne...One objective of developing machine learning(ML)-based material models is to integrate them with well-established numerical methods to solve boundary value problems(BVPs).In the family of ML models,recurrent neural networks(RNNs)have been extensively applied to capture history-dependent constitutive responses of granular materials,but these multiple-step-based neural networks are neither sufficiently efficient nor aligned with the standard finite element method(FEM).Single-step-based neural networks like the multi-layer perceptron(MLP)are an alternative to bypass the above issues but have to introduce some internal variables to encode complex loading histories.In this work,one novel Frobenius norm-based internal variable,together with the Fourier layer and residual architectureenhanced MLP model,is crafted to replicate the history-dependent constitutive features of representative volume element(RVE)for granular materials.The obtained ML models are then seamlessly embedded into the FEM to solve the BVP of a biaxial compression case and a rigid strip footing case.The obtained solutions are comparable to results from the FEM-DEM multiscale modelling but achieve significantly improved efficiency.The results demonstrate the applicability of the proposed internal variable in enabling MLP to capture highly nonlinear constitutive responses of granular materials.展开更多
In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extract...In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extraction ability,and they cannot mine the discriminating features of the protocol data thoroughly.To address the issue,we propose an unknown application layer protocol recognition method based on deep clustering.Deep clustering which consists of the deep neural network and the clustering algorithm can automatically extract the features of the input and cluster the data based on the extracted features.Compared with the traditional clustering methods,deep clustering boasts of higher clustering accuracy.The proposed method utilizes network-in-network(NIN),channel attention,spatial attention and Bidirectional Long Short-term memory(BLSTM)to construct an autoencoder to extract the spatial-temporal features of the protocol data,and utilizes the unsupervised clustering algorithm to recognize the unknown protocols based on the features.The method firstly extracts the application layer protocol data from the network traffic and transforms the data into one-dimensional matrix.Secondly,the autoencoder is pretrained,and the protocol data is compressed into low dimensional latent space by the autoencoder and the initial clustering is performed with K-Means.Finally,the clustering loss is calculated and the classification model is optimized according to the clustering loss.The classification results can be obtained when the classification model is optimal.Compared with the existing unknown protocol recognition methods,the proposed method utilizes deep clustering to cluster the unknown protocols,and it can mine the key features of the protocol data and recognize the unknown protocols accurately.Experimental results show that the proposed method can effectively recognize the unknown protocols,and its performance is better than other methods.展开更多
Artificial intelligence(AI)technology has become integral in the realm of medicine and healthcare,particularly in human activity recognition(HAR)applications such as fitness and rehabilitation tracking.This study intr...Artificial intelligence(AI)technology has become integral in the realm of medicine and healthcare,particularly in human activity recognition(HAR)applications such as fitness and rehabilitation tracking.This study introduces a robust coupling analysis framework that integrates four AI-enabled models,combining both machine learning(ML)and deep learning(DL)approaches to evaluate their effectiveness in HAR.The analytical dataset comprises 561 features sourced from the UCI-HAR database,forming the foundation for training the models.Additionally,the MHEALTH database is employed to replicate the modeling process for comparative purposes,while inclusion of the WISDM database,renowned for its challenging features,supports the framework’s resilience and adaptability.The ML-based models employ the methodologies including adaptive neuro-fuzzy inference system(ANFIS),support vector machine(SVM),and random forest(RF),for data training.In contrast,a DL-based model utilizes one-dimensional convolution neural network(1dCNN)to automate feature extraction.Furthermore,the recursive feature elimination(RFE)algorithm,which drives an ML-based estimator to eliminate low-participation features,helps identify the optimal features for enhancing model performance.The best accuracies of the ANFIS,SVM,RF,and 1dCNN models with meticulous featuring process achieve around 90%,96%,91%,and 93%,respectively.Comparative analysis using the MHEALTH dataset showcases the 1dCNN model’s remarkable perfect accuracy(100%),while the RF,SVM,and ANFIS models equipped with selected features achieve accuracies of 99.8%,99.7%,and 96.5%,respectively.Finally,when applied to the WISDM dataset,the DL-based and ML-based models attain accuracies of 91.4%and 87.3%,respectively,aligning with prior research findings.In conclusion,the proposed framework yields HAR models with commendable performance metrics,exhibiting its suitability for integration into the healthcare services system through AI-driven applications.展开更多
The identification of intercepted radio fuze modulation types is a prerequisite for decision-making in interference systems.However,the electromagnetic environment of modern battlefields is complex,and the signal-to-n...The identification of intercepted radio fuze modulation types is a prerequisite for decision-making in interference systems.However,the electromagnetic environment of modern battlefields is complex,and the signal-to-noise ratio(SNR)of such environments is usually low,which makes it difficult to implement accurate recognition of radio fuzes.To solve the above problem,a radio fuze automatic modulation recognition(AMR)method for low-SNR environments is proposed.First,an adaptive denoising algorithm based on data rearrangement and the two-dimensional(2D)fast Fourier transform(FFT)(DR2D)is used to reduce the noise of the intercepted radio fuze intermediate frequency(IF)signal.Then,the textural features of the denoised IF signal rearranged data matrix are extracted from the statistical indicator vectors of gray-level cooccurrence matrices(GLCMs),and support vector machines(SVMs)are used for classification.The DR2D-based adaptive denoising algorithm achieves an average correlation coefficient of more than 0.76 for ten fuze types under SNRs of-10 d B and above,which is higher than that of other typical algorithms.The trained SVM classification model achieves an average recognition accuracy of more than 96%on seven modulation types and recognition accuracies of more than 94%on each modulation type under SNRs of-12 d B and above,which represents a good AMR performance of radio fuzes under low SNRs.展开更多
Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recogn...Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recognition System(TSRS)is one of themost important components ofADAS.Among the challengeswith TSRS is being able to recognize road signs with the highest accuracy and the shortest processing time.Accordingly,this paper introduces a new real time methodology recognizing Speed Limit Signs based on a trio of developed modules.Firstly,the Speed Limit Detection(SLD)module uses the Haar Cascade technique to generate a new SL detector in order to localize SL signs within captured frames.Secondly,the Speed Limit Classification(SLC)module,featuring machine learning classifiers alongside a newly developed model called DeepSL,harnesses the power of a CNN architecture to extract intricate features from speed limit sign images,ensuring efficient and precise recognition.In addition,a new Speed Limit Classifiers Fusion(SLCF)module has been developed by combining trained ML classifiers and the DeepSL model by using the Dempster-Shafer theory of belief functions and ensemble learning’s voting technique.Through rigorous software and hardware validation processes,the proposedmethodology has achieved highly significant F1 scores of 99.98%and 99.96%for DS theory and the votingmethod,respectively.Furthermore,a prototype encompassing all components demonstrates outstanding reliability and efficacy,with processing times of 150 ms for the Raspberry Pi board and 81.5 ms for the Nano Jetson board,marking a significant advancement in TSRS technology.展开更多
In the field of computer vision and pattern recognition,knowledge based on images of human activity has gained popularity as a research topic.Activity recognition is the process of determining human behavior based on ...In the field of computer vision and pattern recognition,knowledge based on images of human activity has gained popularity as a research topic.Activity recognition is the process of determining human behavior based on an image.We implemented an Extended Kalman filter to create an activity recognition system here.The proposed method applies an HSI color transformation in its initial stages to improve the clarity of the frame of the image.To minimize noise,we use Gaussian filters.Extraction of silhouette using the statistical method.We use Binary Robust Invariant Scalable Keypoints(BRISK)and SIFT for feature extraction.The next step is to perform feature discrimination using Gray Wolf.After that,the features are input into the Extended Kalman filter and classified into relevant human activities according to their definitive characteristics.The experimental procedure uses the SUB-Interaction and HMDB51 datasets to a 0.88%and 0.86%recognition rate.展开更多
Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi...Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.展开更多
This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distri...This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distribution feature extraction layer in SDFEN replaces convolutional output neural networks with the spatial distribution features that focus more on inter-sample information by incorporating class center vectors.The designed hybrid loss function considers both intra-class distance and inter-class distance,thereby enhancing the similarity among samples of the same class and increasing the dissimilarity between samples of different classes during training.Consequently,this method allows unknown classes to occupy a larger space in the feature space.This reduces the possibility of overlap with known class samples and makes the boundaries between known and unknown samples more distinct.Additionally,the feature comparator threshold can be used to reject unknown samples.For signal open set recognition,seven methods,including the proposed method,are applied to two kinds of electromagnetic signal data:modulation signal and real-world emitter.The experimental results demonstrate that the proposed method outperforms the other six methods overall in a simulated open environment.Specifically,compared to the state-of-the-art Openmax method,the novel method achieves up to 8.87%and 5.25%higher micro-F-measures,respectively.展开更多
Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automa...Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.展开更多
基金supported by the MSIT(Ministry of Science and ICT)Korea,under the ITRC(Information Technology Research Center)Support Program(IITP-2023-2018-0-01426)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)+1 种基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R410),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabiathe Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/12/6).
文摘Object segmentation and recognition is an imperative area of computer vision andmachine learning that identifies and separates individual objects within an image or video and determines classes or categories based on their features.The proposed system presents a distinctive approach to object segmentation and recognition using Artificial Neural Networks(ANNs).The system takes RGB images as input and uses a k-means clustering-based segmentation technique to fragment the intended parts of the images into different regions and label thembased on their characteristics.Then,two distinct kinds of features are obtained from the segmented images to help identify the objects of interest.An Artificial Neural Network(ANN)is then used to recognize the objects based on their features.Experiments were carried out with three standard datasets,MSRC,MS COCO,and Caltech 101 which are extensively used in object recognition research,to measure the productivity of the suggested approach.The findings from the experiment support the suggested system’s validity,as it achieved class recognition accuracies of 89%,83%,and 90.30% on the MSRC,MS COCO,and Caltech 101 datasets,respectively.
基金supported by the National Key Research and Development Program of China(No.2022YFB3404700)the National Natural Science Foundation of China(Nos.52105313 and 52275299)+2 种基金the Research and Development Program of Beijing Municipal Education Commission,China(No.KM202210005036)the Natural Science Foundation of Chongqing,China(No.CSTB2023NSCQ-MSX0701)the National Defense Basic Research Projects of China(No.JCKY2022405C002).
文摘At present,the emerging solid-phase friction-based additive manufacturing technology,including friction rolling additive man-ufacturing(FRAM),can only manufacture simple single-pass components.In this study,multi-layer multi-pass FRAM-deposited alumin-um alloy samples were successfully prepared using a non-shoulder tool head.The material flow behavior and microstructure of the over-lapped zone between adjacent layers and passes during multi-layer multi-pass FRAM deposition were studied using the hybrid 6061 and 5052 aluminum alloys.The results showed that a mechanical interlocking structure was formed between the adjacent layers and the adja-cent passes in the overlapped center area.Repeated friction and rolling of the tool head led to different degrees of lateral flow and plastic deformation of the materials in the overlapped zone,which made the recrystallization degree in the left and right edge zones of the over-lapped zone the highest,followed by the overlapped center zone and the non-overlapped zone.The tensile strength of the overlapped zone exceeded 90%of that of the single-pass deposition sample.It is proved that although there are uneven grooves on the surface of the over-lapping area during multi-layer and multi-pass deposition,they can be filled by the flow of materials during the deposition of the next lay-er,thus ensuring the dense microstructure and excellent mechanical properties of the overlapping area.The multi-layer multi-pass FRAM deposition overcomes the limitation of deposition width and lays the foundation for the future deposition of large-scale high-performance components.
文摘A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.
文摘In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
基金supported by the Key Research and Development Program of Jiangsu Province under Grant BE2022059-3,CTBC Bank through the Industry-Academia Cooperation Project,as well as by the Ministry of Science and Technology of Taiwan through Grants MOST-108-2218-E-002-055,MOST-109-2223-E-009-002-MY3,MOST-109-2218-E-009-025,and MOST431109-2218-E-002-015.
文摘Micro-expressions(ME)recognition is a complex task that requires advanced techniques to extract informative features fromfacial expressions.Numerous deep neural networks(DNNs)with convolutional structures have been proposed.However,unlike DNNs,shallow convolutional neural networks often outperform deeper models in mitigating overfitting,particularly with small datasets.Still,many of these methods rely on a single feature for recognition,resulting in an insufficient ability to extract highly effective features.To address this limitation,in this paper,an Improved Dual-stream Shallow Convolutional Neural Network based on an Extreme Gradient Boosting Algorithm(IDSSCNN-XgBoost)is introduced for ME Recognition.The proposed method utilizes a dual-stream architecture where motion vectors(temporal features)are extracted using Optical Flow TV-L1 and amplify subtle changes(spatial features)via EulerianVideoMagnification(EVM).These features are processed by IDSSCNN,with an attention mechanism applied to refine the extracted effective features.The outputs are then fused,concatenated,and classified using the XgBoost algorithm.This comprehensive approach significantly improves recognition accuracy by leveraging the strengths of both temporal and spatial information,supported by the robust classification power of XgBoost.The proposed method is evaluated on three publicly available ME databases named Chinese Academy of Sciences Micro-expression Database(CASMEII),Spontaneous Micro-Expression Database(SMICHS),and Spontaneous Actions and Micro-Movements(SAMM).Experimental results indicate that the proposed model can achieve outstanding results compared to recent models.The accuracy results are 79.01%,69.22%,and 68.99%on CASMEII,SMIC-HS,and SAMM,and the F1-score are 75.47%,68.91%,and 63.84%,respectively.The proposed method has the advantage of operational efficiency and less computational time.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution.
基金supported in part by the 2023 Key Supported Project of the 14th Five Year Plan for Education and Science in Hunan Province with No.ND230795.
文摘In recent years,skeleton-based action recognition has made great achievements in Computer Vision.A graph convolutional network(GCN)is effective for action recognition,modelling the human skeleton as a spatio-temporal graph.Most GCNs define the graph topology by physical relations of the human joints.However,this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs,resulting in a low recognition rate for specific actions with implicit correlation between joint pairs.In addition,existing methods ignore the trend correlation between adjacent frames within an action and context clues,leading to erroneous action recognition with similar poses.Therefore,this study proposes a learnable GCN based on behavior dependence,which considers implicit joint correlation by constructing a dynamic learnable graph with extraction of specific behavior dependence of joint pairs.By using the weight relationship between the joint pairs,an adaptive model is constructed.It also designs a self-attention module to obtain their inter-frame topological relationship for exploring the context of actions.Combining the shared topology and the multi-head self-attention map,the module obtains the context-based clue topology to update the dynamic graph convolution,achieving accurate recognition of different actions with similar poses.Detailed experiments on public datasets demonstrate that the proposed method achieves better results and realizes higher quality representation of actions under various evaluation protocols compared to state-of-the-art methods.
基金financially supported by the National Natural Science Foundation of China(Grant Nos.52001088,52271269,U1906233)the Natural Science Foundation of Heilongjiang Province(Grant No.LH2021E050)+2 种基金the State Key Laboratory of Ocean Engineering(Grant No.GKZD010084)Liaoning Province’s Xing Liao Talents Program(Grant No.XLYC2002108)Dalian City Supports Innovation and Entrepreneurship Projects for High-Level Talents(Grant No.2021RD16)。
文摘Marine umbilical is one of the key equipment for subsea oil and gas exploitation,which is usually integrated by a great number of different functional components with multi-layers.The layout of these components directly affects manufacturing,operation and storage performances of the umbilical.For the multi-layer cross-sectional layout design of the umbilical,a quantifiable multi-objective optimization model is established according to the operation and storage requirements.Considering the manufacturing factors,the multi-layering strategy based on contact point identification is introduced for a great number of functional components.Then,the GA-GLM global optimization algorithm is proposed combining the genetic algorithm and the generalized multiplier method,and the selection operator of the genetic algorithm is improved based on the steepest descent method.Genetic algorithm is used to find the optimal solution in the global space,which can converge from any initial layout to the feasible layout solution.The feasible layout solution is taken as the initial value of the generalized multiplier method for fast and accurate solution.Finally,taking umbilicals with a great number of components as examples,the results show that the cross-sectional performance of the umbilical obtained by optimization algorithm is better and the solution efficiency is higher.Meanwhile,the multi-layering strategy is effective and feasible.The design method proposed in this paper can quickly obtain the optimal multi-layer cross-sectional layout,which replaces the manual design,and provides useful reference and guidance for the umbilical industry.
文摘Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis.
基金the National Natural Science Foundation of China(Grant No.52072041)the Beijing Natural Science Foundation(Grant No.JQ21007)+2 种基金the University of Chinese Academy of Sciences(Grant No.Y8540XX2D2)the Robotics Rhino-Bird Focused Research Project(No.2020-01-002)the Tencent Robotics X Laboratory.
文摘Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence.
基金supported by the National Natural Science Foundation of China(NSFC)(Grant No.12072217).
文摘One objective of developing machine learning(ML)-based material models is to integrate them with well-established numerical methods to solve boundary value problems(BVPs).In the family of ML models,recurrent neural networks(RNNs)have been extensively applied to capture history-dependent constitutive responses of granular materials,but these multiple-step-based neural networks are neither sufficiently efficient nor aligned with the standard finite element method(FEM).Single-step-based neural networks like the multi-layer perceptron(MLP)are an alternative to bypass the above issues but have to introduce some internal variables to encode complex loading histories.In this work,one novel Frobenius norm-based internal variable,together with the Fourier layer and residual architectureenhanced MLP model,is crafted to replicate the history-dependent constitutive features of representative volume element(RVE)for granular materials.The obtained ML models are then seamlessly embedded into the FEM to solve the BVP of a biaxial compression case and a rigid strip footing case.The obtained solutions are comparable to results from the FEM-DEM multiscale modelling but achieve significantly improved efficiency.The results demonstrate the applicability of the proposed internal variable in enabling MLP to capture highly nonlinear constitutive responses of granular materials.
基金This work is supported by the National Key R&D Program of China(2017YFB0802900).
文摘In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extraction ability,and they cannot mine the discriminating features of the protocol data thoroughly.To address the issue,we propose an unknown application layer protocol recognition method based on deep clustering.Deep clustering which consists of the deep neural network and the clustering algorithm can automatically extract the features of the input and cluster the data based on the extracted features.Compared with the traditional clustering methods,deep clustering boasts of higher clustering accuracy.The proposed method utilizes network-in-network(NIN),channel attention,spatial attention and Bidirectional Long Short-term memory(BLSTM)to construct an autoencoder to extract the spatial-temporal features of the protocol data,and utilizes the unsupervised clustering algorithm to recognize the unknown protocols based on the features.The method firstly extracts the application layer protocol data from the network traffic and transforms the data into one-dimensional matrix.Secondly,the autoencoder is pretrained,and the protocol data is compressed into low dimensional latent space by the autoencoder and the initial clustering is performed with K-Means.Finally,the clustering loss is calculated and the classification model is optimized according to the clustering loss.The classification results can be obtained when the classification model is optimal.Compared with the existing unknown protocol recognition methods,the proposed method utilizes deep clustering to cluster the unknown protocols,and it can mine the key features of the protocol data and recognize the unknown protocols accurately.Experimental results show that the proposed method can effectively recognize the unknown protocols,and its performance is better than other methods.
基金funded by the National Science and Technology Council,Taiwan(Grant No.NSTC 112-2121-M-039-001)by China Medical University(Grant No.CMU112-MF-79).
文摘Artificial intelligence(AI)technology has become integral in the realm of medicine and healthcare,particularly in human activity recognition(HAR)applications such as fitness and rehabilitation tracking.This study introduces a robust coupling analysis framework that integrates four AI-enabled models,combining both machine learning(ML)and deep learning(DL)approaches to evaluate their effectiveness in HAR.The analytical dataset comprises 561 features sourced from the UCI-HAR database,forming the foundation for training the models.Additionally,the MHEALTH database is employed to replicate the modeling process for comparative purposes,while inclusion of the WISDM database,renowned for its challenging features,supports the framework’s resilience and adaptability.The ML-based models employ the methodologies including adaptive neuro-fuzzy inference system(ANFIS),support vector machine(SVM),and random forest(RF),for data training.In contrast,a DL-based model utilizes one-dimensional convolution neural network(1dCNN)to automate feature extraction.Furthermore,the recursive feature elimination(RFE)algorithm,which drives an ML-based estimator to eliminate low-participation features,helps identify the optimal features for enhancing model performance.The best accuracies of the ANFIS,SVM,RF,and 1dCNN models with meticulous featuring process achieve around 90%,96%,91%,and 93%,respectively.Comparative analysis using the MHEALTH dataset showcases the 1dCNN model’s remarkable perfect accuracy(100%),while the RF,SVM,and ANFIS models equipped with selected features achieve accuracies of 99.8%,99.7%,and 96.5%,respectively.Finally,when applied to the WISDM dataset,the DL-based and ML-based models attain accuracies of 91.4%and 87.3%,respectively,aligning with prior research findings.In conclusion,the proposed framework yields HAR models with commendable performance metrics,exhibiting its suitability for integration into the healthcare services system through AI-driven applications.
基金National Natural Science Foundation of China under Grant No.61973037China Postdoctoral Science Foundation 2022M720419 to provide fund for conducting experiments。
文摘The identification of intercepted radio fuze modulation types is a prerequisite for decision-making in interference systems.However,the electromagnetic environment of modern battlefields is complex,and the signal-to-noise ratio(SNR)of such environments is usually low,which makes it difficult to implement accurate recognition of radio fuzes.To solve the above problem,a radio fuze automatic modulation recognition(AMR)method for low-SNR environments is proposed.First,an adaptive denoising algorithm based on data rearrangement and the two-dimensional(2D)fast Fourier transform(FFT)(DR2D)is used to reduce the noise of the intercepted radio fuze intermediate frequency(IF)signal.Then,the textural features of the denoised IF signal rearranged data matrix are extracted from the statistical indicator vectors of gray-level cooccurrence matrices(GLCMs),and support vector machines(SVMs)are used for classification.The DR2D-based adaptive denoising algorithm achieves an average correlation coefficient of more than 0.76 for ten fuze types under SNRs of-10 d B and above,which is higher than that of other typical algorithms.The trained SVM classification model achieves an average recognition accuracy of more than 96%on seven modulation types and recognition accuracies of more than 94%on each modulation type under SNRs of-12 d B and above,which represents a good AMR performance of radio fuzes under low SNRs.
文摘Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recognition System(TSRS)is one of themost important components ofADAS.Among the challengeswith TSRS is being able to recognize road signs with the highest accuracy and the shortest processing time.Accordingly,this paper introduces a new real time methodology recognizing Speed Limit Signs based on a trio of developed modules.Firstly,the Speed Limit Detection(SLD)module uses the Haar Cascade technique to generate a new SL detector in order to localize SL signs within captured frames.Secondly,the Speed Limit Classification(SLC)module,featuring machine learning classifiers alongside a newly developed model called DeepSL,harnesses the power of a CNN architecture to extract intricate features from speed limit sign images,ensuring efficient and precise recognition.In addition,a new Speed Limit Classifiers Fusion(SLCF)module has been developed by combining trained ML classifiers and the DeepSL model by using the Dempster-Shafer theory of belief functions and ensemble learning’s voting technique.Through rigorous software and hardware validation processes,the proposedmethodology has achieved highly significant F1 scores of 99.98%and 99.96%for DS theory and the votingmethod,respectively.Furthermore,a prototype encompassing all components demonstrates outstanding reliability and efficacy,with processing times of 150 ms for the Raspberry Pi board and 81.5 ms for the Nano Jetson board,marking a significant advancement in TSRS technology.
基金funded by the Open Access Initiative of the University of Bremen and the DFG via SuUB Bremen.The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program grant code(NU/RG/SERC/13/40).
文摘In the field of computer vision and pattern recognition,knowledge based on images of human activity has gained popularity as a research topic.Activity recognition is the process of determining human behavior based on an image.We implemented an Extended Kalman filter to create an activity recognition system here.The proposed method applies an HSI color transformation in its initial stages to improve the clarity of the frame of the image.To minimize noise,we use Gaussian filters.Extraction of silhouette using the statistical method.We use Binary Robust Invariant Scalable Keypoints(BRISK)and SIFT for feature extraction.The next step is to perform feature discrimination using Gray Wolf.After that,the features are input into the Extended Kalman filter and classified into relevant human activities according to their definitive characteristics.The experimental procedure uses the SUB-Interaction and HMDB51 datasets to a 0.88%and 0.86%recognition rate.
文摘Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.
文摘This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distribution feature extraction layer in SDFEN replaces convolutional output neural networks with the spatial distribution features that focus more on inter-sample information by incorporating class center vectors.The designed hybrid loss function considers both intra-class distance and inter-class distance,thereby enhancing the similarity among samples of the same class and increasing the dissimilarity between samples of different classes during training.Consequently,this method allows unknown classes to occupy a larger space in the feature space.This reduces the possibility of overlap with known class samples and makes the boundaries between known and unknown samples more distinct.Additionally,the feature comparator threshold can be used to reject unknown samples.For signal open set recognition,seven methods,including the proposed method,are applied to two kinds of electromagnetic signal data:modulation signal and real-world emitter.The experimental results demonstrate that the proposed method outperforms the other six methods overall in a simulated open environment.Specifically,compared to the state-of-the-art Openmax method,the novel method achieves up to 8.87%and 5.25%higher micro-F-measures,respectively.
基金supported from the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.