Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japane...Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.展开更多
This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distri...This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distribution feature extraction layer in SDFEN replaces convolutional output neural networks with the spatial distribution features that focus more on inter-sample information by incorporating class center vectors.The designed hybrid loss function considers both intra-class distance and inter-class distance,thereby enhancing the similarity among samples of the same class and increasing the dissimilarity between samples of different classes during training.Consequently,this method allows unknown classes to occupy a larger space in the feature space.This reduces the possibility of overlap with known class samples and makes the boundaries between known and unknown samples more distinct.Additionally,the feature comparator threshold can be used to reject unknown samples.For signal open set recognition,seven methods,including the proposed method,are applied to two kinds of electromagnetic signal data:modulation signal and real-world emitter.The experimental results demonstrate that the proposed method outperforms the other six methods overall in a simulated open environment.Specifically,compared to the state-of-the-art Openmax method,the novel method achieves up to 8.87%and 5.25%higher micro-F-measures,respectively.展开更多
In view of low recognition rate of complex radar intra-pulse modulation signal type by traditional methods under low signal-to-noise ratio(SNR),the paper proposes an automatic recog-nition method of complex radar intr...In view of low recognition rate of complex radar intra-pulse modulation signal type by traditional methods under low signal-to-noise ratio(SNR),the paper proposes an automatic recog-nition method of complex radar intra-pulse modulation signal type based on deep residual network.The basic principle of the recognition method is to obtain the transformation relationship between the time and frequency of complex radar intra-pulse modulation signal through short-time Fourier transform(STFT),and then design an appropriate deep residual network to extract the features of the time-frequency map and complete a variety of complex intra-pulse modulation signal type recognition.In addition,in order to improve the generalization ability of the proposed method,label smoothing and L2 regularization are introduced.The simulation results show that the proposed method has a recognition accuracy of more than 95%for complex radar intra-pulse modulation sig-nal types under low SNR(2 dB).展开更多
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula...The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.展开更多
With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices,crowdsensing systems in the Internet of Things(IoT)are now conducting complicated video analy...With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices,crowdsensing systems in the Internet of Things(IoT)are now conducting complicated video analysis tasks such as behaviour recognition.These applications have dramatically increased the diversity of IoT systems.Specifically,behaviour recognition in videos usually requires a combinatorial analysis of the spatial information about objects and information about their dynamic actions in the temporal dimension.Behaviour recognition may even rely more on the modeling of temporal information containing short-range and long-range motions,in contrast to computer vision tasks involving images that focus on understanding spatial information.However,current solutions fail to jointly and comprehensively analyse short-range motions between adjacent frames and long-range temporal aggregations at large scales in videos.In this paper,we propose a novel behaviour recognition method based on the integration of multigranular(IMG)motion features,which can provide support for deploying video analysis in multimedia IoT crowdsensing systems.In particular,we achieve reliable motion information modeling by integrating a channel attention-based short-term motion feature enhancement module(CSEM)and a cascaded long-term motion feature integration module(CLIM).We evaluate our model on several action recognition benchmarks,such as HMDB51,Something-Something and UCF101.The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods,which confirms its effective-ness and efficiency.展开更多
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext...Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assem...The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assembly errors,and other imperfections that may arise during the design or manufacturing phases.Conse-quently,these micro-motors might generate anomalous noises during their operation,consequently exerting a substantial adverse influence on the overall comfort of drivers and passengers.Automobile micro-motors exhibit a diverse array of structural variations,consequently leading to the manifestation of a multitude of distinctive auditory irregularities.To address the identification of diverse forms of abnormal noise,this research presents a novel approach rooted in the utilization of vibro-acoustic fusion-convolutional neural network(VAF-CNN).This method entails the deployment of distinct network branches,each serving to capture disparate features from the multi-sensor data,all the while considering the auditory perception traits inherent in the human auditory sys-tem.The intermediary layer integrates the concept of adaptive weighting of multi-sensor features,thus affording a calibration mechanism for the features hailing from multiple sensors,thereby enabling a further refinement of features within the branch network.For optimal model efficacy,a feature fusion mechanism is implemented in the concluding layer.To substantiate the efficacy of the proposed approach,this paper initially employs an augmented data methodology inspired by modified SpecAugment,applied to the dataset of abnormal noise sam-ples,encompassing scenarios both with and without in-vehicle interior noise.This serves to mitigate the issue of limited sample availability.Subsequent comparative evaluations are executed,contrasting the performance of the model founded upon single-sensor data against other feature fusion models reliant on multi-sensor data.The experimental results substantiate that the suggested methodology yields heightened recognition accuracy and greater resilience against interference.Moreover,it holds notable practical significance in the engineering domain,as it furnishes valuable support for the targeted management of noise emanating from vehicle micro-motors.展开更多
With the intensifying aging of the population,the phenomenon of the elderly living alone is also increasing.Therefore,using modern internet of things technology to monitor the daily behavior of the elderly in indoors ...With the intensifying aging of the population,the phenomenon of the elderly living alone is also increasing.Therefore,using modern internet of things technology to monitor the daily behavior of the elderly in indoors is a meaningful study.Video-based action recognition tasks are easily affected by object occlusion and weak ambient light,resulting in poor recognition performance.Therefore,this paper proposes an indoor human behavior recognition method based on wireless fidelity(Wi-Fi)perception and video feature fusion by utilizing the ability of Wi-Fi signals to carry environmental information during the propagation process.This paper uses the public WiFi-based activity recognition dataset(WIAR)containing Wi-Fi channel state information and essential action videos,and then extracts video feature vectors and Wi-Fi signal feature vectors in the datasets through the two-stream convolutional neural network and standard statistical algorithms,respectively.Then the two sets of feature vectors are fused,and finally,the action classification and recognition are performed by the support vector machine(SVM).The experiments in this paper contrast experiments between the two-stream network model and the methods in this paper under three different environments.And the accuracy of action recognition after adding Wi-Fi signal feature fusion is improved by 10%on average.展开更多
Currently,the use of intelligent systems for the automatic recognition of targets in the fields of defence and military has increased significantly.The primary advantage of these systems is that they do not need human...Currently,the use of intelligent systems for the automatic recognition of targets in the fields of defence and military has increased significantly.The primary advantage of these systems is that they do not need human participation in target recognition processes.This paper uses the particle swarm optimization(PSO)algorithm to select the optimal features in the micro-Doppler signature of sonar targets.The microDoppler effect is referred to amplitude/phase modulation on the received signal by rotating parts of a target such as propellers.Since different targets'geometric and physical properties are not the same,their micro-Doppler signature is different.This Inconsistency can be considered a practical issue(especially in the frequency domain)for sonar target recognition.Despite using 128-point fast Fourier transform(FFT)for the feature extraction step,not all extracted features contain helpful information.As a result,PSO selects the most optimum and valuable features.To evaluate the micro-Doppler signature of sonar targets and the effect of feature selection on sonar target recognition,the simplest and most popular machine learning algorithm,k-nearest neighbor(k-NN),is used,which is called k-PSO in this paper because of the use of PSO for feature selection.The parameters measured are the correct recognition rate,reliability rate,and processing time.The simulation results show that k-PSO achieved a 100%correct recognition rate and reliability rate at 19.35 s when using simulated data at a 15 dB signal-tonoise ratio(SNR)angle of 40°.Also,for the experimental dataset obtained from the cavitation tunnel,the correct recognition rate is 98.26%,and the reliability rate is 99.69%at 18.46s.Therefore,the k-PSO has an encouraging performance in automatically recognizing sonar targets when using experimental datasets and for real-world use.展开更多
Human action recognition(HAR)attempts to understand a subject’sbehavior and assign a label to each action performed.It is more appealingbecause it has a wide range of applications in computer vision,such asvideo surv...Human action recognition(HAR)attempts to understand a subject’sbehavior and assign a label to each action performed.It is more appealingbecause it has a wide range of applications in computer vision,such asvideo surveillance and smart cities.Many attempts have been made in theliterature to develop an effective and robust framework for HAR.Still,theprocess remains difficult and may result in reduced accuracy due to severalchallenges,such as similarity among actions,extraction of essential features,and reduction of irrelevant features.In this work,we proposed an end-toendframework using deep learning and an improved tree seed optimizationalgorithm for accurate HAR.The proposed design consists of a fewsignificantsteps.In the first step,frame preprocessing is performed.In the second step,two pre-trained deep learning models are fine-tuned and trained throughdeep transfer learning using preprocessed video frames.In the next step,deeplearning features of both fine-tuned models are fused using a new ParallelStandard Deviation Padding Max Value approach.The fused features arefurther optimized using an improved tree seed algorithm,and select the bestfeatures are finally classified by using the machine learning classifiers.Theexperiment was carried out on five publicly available datasets,including UTInteraction,Weizmann,KTH,Hollywood,and IXAMS,and achieved higheraccuracy than previous techniques.展开更多
Shield machines are currently the main tool for underground tunnel construction. Due to the complexity and variability of the underground construction environment, it is necessary to accurately identify the ground in ...Shield machines are currently the main tool for underground tunnel construction. Due to the complexity and variability of the underground construction environment, it is necessary to accurately identify the ground in real-time during the tunnel construction process to match and adjust the tunnel parameters according to the geological conditions to ensure construction safety. Compared with the traditional method of stratum identifcation based on staged drilling sampling, the real-time stratum identifcation method based on construction data has the advantages of low cost and high precision. Due to the huge amount of sensor data of the ultra-large diameter mud-water balance shield machine, in order to balance the identifcation time and recognition accuracy of the formation, it is necessary to screen the multivariate data features collected by hundreds of sensors. In response to this problem, this paper proposes a voting-based feature extraction method (VFS), which integrates multiple feature extraction algorithms FSM, and the frequency of each feature in all feature extraction algorithms is the basis for voting. At the same time, in order to verify the wide applicability of the method, several commonly used classifcation models are used to train and test the obtained efective feature data, and the model accuracy and recognition time are used as evaluation indicators, and the classifcation with the best combination with VFS is obtained. The experimental results of shield machine data of 6 diferent geological structures show that the average accuracy of 13 features obtained by VFS combined with diferent classifcation algorithms is 91%;among them, the random forest model takes less time and has the highest recognition accuracy, reaching 93%, showing best compatibility with VFS. Therefore, the VFS algorithm proposed in this paper has high reliability and wide applicability for stratum identifcation in the process of tunnel construction, and can be matched with a variety of classifer algorithms. By combining 13 features selected from shield machine data features with random forest, the identifcation of the construction stratum environment of shield tunnels can be well realized, and further theoretical guidance for underground engineering construction can be provided.展开更多
Gait recognition is an active research area that uses a walking theme to identify the subject correctly.Human Gait Recognition(HGR)is performed without any cooperation from the individual.However,in practice,it remain...Gait recognition is an active research area that uses a walking theme to identify the subject correctly.Human Gait Recognition(HGR)is performed without any cooperation from the individual.However,in practice,it remains a challenging task under diverse walking sequences due to the covariant factors such as normal walking and walking with wearing a coat.Researchers,over the years,have worked on successfully identifying subjects using different techniques,but there is still room for improvement in accuracy due to these covariant factors.This paper proposes an automated model-free framework for human gait recognition in this article.There are a few critical steps in the proposed method.Firstly,optical flow-based motion region esti-mation and dynamic coordinates-based cropping are performed.The second step involves training a fine-tuned pre-trained MobileNetV2 model on both original and optical flow cropped frames;the training has been conducted using static hyperparameters.The third step proposed a fusion technique known as normal distribution serially fusion.In the fourth step,a better optimization algorithm is applied to select the best features,which are then classified using a Bi-Layered neural network.Three publicly available datasets,CASIA A,CASIA B,and CASIA C,were used in the experimental process and obtained average accuracies of 99.6%,91.6%,and 95.02%,respectively.The proposed framework has achieved improved accuracy compared to the other methods.展开更多
The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction...The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction tech-niques,specifically through the incorporation of high-resolution Mel-spectrograms and the expedited calculation of Mel Frequency Cepstral Coefficients(MFCC).This initiative aimed to refine the system’s accuracy by identifying and mitigating the shortcomings commonly found in current approaches.Ultimately,the primary objective was to elevate both the intricacy and effectiveness of our SER model,with a focus on augmenting its proficiency in the accurate identification of emotions in spoken language.The research employed a dual-strategy approach for feature extraction.Firstly,a rapid computation technique for MFCC was implemented and integrated with a Bi-LSTM layer to optimize the encoding of MFCC features.Secondly,a pretrained ResNet model was utilized in conjunction with feature Stats pooling and dense layers for the effective encoding of Mel-spectrogram attributes.These two sets of features underwent separate processing before being combined in a Convolutional Neural Network(CNN)outfitted with a dense layer,with the aim of enhancing their representational richness.The model was rigorously evaluated using two prominent databases:CMU-MOSEI and RAVDESS.Notable findings include an accuracy rate of 93.2%on the CMU-MOSEI database and 95.3%on the RAVDESS database.Such exceptional performance underscores the efficacy of this innovative approach,which not only meets but also exceeds the accuracy benchmarks established by traditional models in the field of speech emotion recognition.展开更多
Human gait recognition(HGR)is the process of identifying a sub-ject(human)based on their walking pattern.Each subject is a unique walking pattern and cannot be simulated by other subjects.But,gait recognition is not e...Human gait recognition(HGR)is the process of identifying a sub-ject(human)based on their walking pattern.Each subject is a unique walking pattern and cannot be simulated by other subjects.But,gait recognition is not easy and makes the system difficult if any object is carried by a subject,such as a bag or coat.This article proposes an automated architecture based on deep features optimization for HGR.To our knowledge,it is the first architecture in which features are fused using multiset canonical correlation analysis(MCCA).In the proposed method,original video frames are processed for all 11 selected angles of the CASIA B dataset and utilized to train two fine-tuned deep learning models such as Squeezenet and Efficientnet.Deep transfer learning was used to train both fine-tuned models on selected angles,yielding two new targeted models that were later used for feature engineering.Features are extracted from the deep layer of both fine-tuned models and fused into one vector using MCCA.An improved manta ray foraging optimization algorithm is also proposed to select the best features from the fused feature matrix and classified using a narrow neural network classifier.The experimental process was conducted on all 11 angles of the large multi-view gait dataset(CASIA B)dataset and obtained improved accuracy than the state-of-the-art techniques.Moreover,a detailed confidence interval based analysis also shows the effectiveness of the proposed architecture for HGR.展开更多
The BlazePose,which models human body skeletons as spatiotem-poral graphs,has achieved fantastic performance in skeleton-based action identification.Skeleton extraction from photos for mobile devices has been made pos...The BlazePose,which models human body skeletons as spatiotem-poral graphs,has achieved fantastic performance in skeleton-based action identification.Skeleton extraction from photos for mobile devices has been made possible by the BlazePose system.A Spatial-Temporal Graph Con-volutional Network(STGCN)can then forecast the actions.The Spatial-Temporal Graph Convolutional Network(STGCN)can be improved by simply replacing the skeleton input data with a different set of joints that provide more information about the activity of interest.On the other hand,existing approaches require the user to manually set the graph’s topology and then fix it across all input layers and samples.This research shows how to use the Statistical Fractal Search(SFS)-Guided whale optimization algorithm(GWOA).To get the best solution for the GWOA,we adopt the SFS diffusion algorithm,which uses the random walk with a Gaussian distribution method common to growing systems.Continuous values are transformed into binary to apply to the feature-selection problem in conjunction with the BlazePose skeletal topology and stochastic fractal search to construct a novel implementation of the BlazePose topology for action recognition.In our experiments,we employed the Kinetics and the NTU-RGB+D datasets.The achieved actiona accuracy in the X-View is 93.14%and in the X-Sub is 96.74%.In addition,the proposed model performs better in numerous statistical tests such as the Analysis of Variance(ANOVA),Wilcoxon signed-rank test,histogram,and times analysis.展开更多
Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Car...Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Cardiology,medical imaging technology(2D ultrasonic,MRI)has been proved to be helpful to detect congenital defects of the fetal heart and assists sonographers in prenatal diagnosis.It is a highly complex task to recognize 2D fetal heart ultrasonic standard plane(FHUSP)manually.Compared withmanual identification,automatic identification through artificial intelligence can save a lot of time,ensure the efficiency of diagnosis,and improve the accuracy of diagnosis.In this study,a feature extraction method based on texture features(Local Binary Pattern LBP and Histogram of Oriented Gradient HOG)and combined with Bag of Words(BOW)model is carried out,and then feature fusion is performed.Finally,it adopts Support VectorMachine(SVM)to realize automatic recognition and classification of FHUSP.The data includes 788 standard plane data sets and 448 normal and abnormal plane data sets.Compared with some other methods and the single method model,the classification accuracy of our model has been obviously improved,with the highest accuracy reaching 87.35%.Similarly,we also verify the performance of the model in normal and abnormal planes,and the average accuracy in classifying abnormal and normal planes is 84.92%.The experimental results show that thismethod can effectively classify and predict different FHUSP and can provide certain assistance for sonographers to diagnose fetal congenital heart disease.展开更多
Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image t...Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image to extract features and then recognize specific emotions through a pre-trained model.In contrast,this paper proposes a novel feature vector extraction method using the Euclidean distance between the landmarks changing their positions according to facial expressions,especially around the eyes,eyebrows,nose,andmouth.Then,we apply a newclassifier using an ensemble network to increase emotion recognition accuracy.The emotion recognition performance was compared with the conventional algorithms using public databases.The results indicated that the proposed method achieved higher accuracy than the traditional based on facial expressions for emotion recognition.In particular,our experiments with the FER2013 database show that our proposed method is robust to lighting conditions and backgrounds,with an average of 25% higher performance than previous studies.Consequently,the proposed method is expected to recognize facial expressions,especially fear and anger,to help prevent severe accidents by detecting security-related or dangerous actions in advance.展开更多
Ear recognition is a new kind of biometric identification technology now.Feature extraction is a key step in pattern recognition technology,which determines the accuracy of classification results.The method of single ...Ear recognition is a new kind of biometric identification technology now.Feature extraction is a key step in pattern recognition technology,which determines the accuracy of classification results.The method of single feature extraction can achieve high recognition rate under certain conditions,but the use of double feature extraction can overcome the limitation of single feature extraction.In order to improve the accuracy of classification results,this paper proposes a new method,that is,the method of complementary double feature extraction based on Principal Component Analysis(PCA)and Fisherface,and we apply it to human ear image recognition.The experiment was carried out on the ear image library provided by the University of Science and Technology Beijing.The results show that the ear recognition rate of the proposed method is significantly higher than the single feature extraction using PCA,Fisherface,or Independent component analysis(ICA)alone.展开更多
By combining fractal theory with D-S evidence theory, an algorithm based on the fusion of multi-fractal features is presented. Fractal features are extracted, and basic probability assignment function is designed. Com...By combining fractal theory with D-S evidence theory, an algorithm based on the fusion of multi-fractal features is presented. Fractal features are extracted, and basic probability assignment function is designed. Comparison and simulation are performed on the new algorithm, the old algorithm based on single feature and the algorithm based on neural network. Results of the comparison and simulation illustrate that the new algorithm is feasible and valid.展开更多
基金supported by the Competitive Research Fund of the University of Aizu,Japan.
文摘Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.
文摘This paper proposes a novel open set recognition method,the Spatial Distribution Feature Extraction Network(SDFEN),to address the problem of electromagnetic signal recognition in an open environment.The spatial distribution feature extraction layer in SDFEN replaces convolutional output neural networks with the spatial distribution features that focus more on inter-sample information by incorporating class center vectors.The designed hybrid loss function considers both intra-class distance and inter-class distance,thereby enhancing the similarity among samples of the same class and increasing the dissimilarity between samples of different classes during training.Consequently,this method allows unknown classes to occupy a larger space in the feature space.This reduces the possibility of overlap with known class samples and makes the boundaries between known and unknown samples more distinct.Additionally,the feature comparator threshold can be used to reject unknown samples.For signal open set recognition,seven methods,including the proposed method,are applied to two kinds of electromagnetic signal data:modulation signal and real-world emitter.The experimental results demonstrate that the proposed method outperforms the other six methods overall in a simulated open environment.Specifically,compared to the state-of-the-art Openmax method,the novel method achieves up to 8.87%and 5.25%higher micro-F-measures,respectively.
文摘In view of low recognition rate of complex radar intra-pulse modulation signal type by traditional methods under low signal-to-noise ratio(SNR),the paper proposes an automatic recog-nition method of complex radar intra-pulse modulation signal type based on deep residual network.The basic principle of the recognition method is to obtain the transformation relationship between the time and frequency of complex radar intra-pulse modulation signal through short-time Fourier transform(STFT),and then design an appropriate deep residual network to extract the features of the time-frequency map and complete a variety of complex intra-pulse modulation signal type recognition.In addition,in order to improve the generalization ability of the proposed method,label smoothing and L2 regularization are introduced.The simulation results show that the proposed method has a recognition accuracy of more than 95%for complex radar intra-pulse modulation sig-nal types under low SNR(2 dB).
基金The support of this research was by Hubei Provincial Natural Science Foundation(2022CFB449)Science Research Foundation of Education Department of Hubei Province(B2020061),are gratefully acknowledged.
文摘The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.
基金supported by National Natural Science Foundation of China under grant No.62271125,No.62273071Sichuan Science and Technology Program(No.2022YFG0038,No.2021YFG0018)+1 种基金by Xinjiang Science and Technology Program(No.2022273061)by the Fundamental Research Funds for the Central Universities(No.ZYGX2020ZB034,No.ZYGX2021J019).
文摘With the adoption of cutting-edge communication technologies such as 5G/6G systems and the extensive development of devices,crowdsensing systems in the Internet of Things(IoT)are now conducting complicated video analysis tasks such as behaviour recognition.These applications have dramatically increased the diversity of IoT systems.Specifically,behaviour recognition in videos usually requires a combinatorial analysis of the spatial information about objects and information about their dynamic actions in the temporal dimension.Behaviour recognition may even rely more on the modeling of temporal information containing short-range and long-range motions,in contrast to computer vision tasks involving images that focus on understanding spatial information.However,current solutions fail to jointly and comprehensively analyse short-range motions between adjacent frames and long-range temporal aggregations at large scales in videos.In this paper,we propose a novel behaviour recognition method based on the integration of multigranular(IMG)motion features,which can provide support for deploying video analysis in multimedia IoT crowdsensing systems.In particular,we achieve reliable motion information modeling by integrating a channel attention-based short-term motion feature enhancement module(CSEM)and a cascaded long-term motion feature integration module(CLIM).We evaluate our model on several action recognition benchmarks,such as HMDB51,Something-Something and UCF101.The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods,which confirms its effective-ness and efficiency.
文摘Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.
基金The author received the funding from Sichuan Natural Science Foundation(2022NSFSC1892).
文摘The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assembly errors,and other imperfections that may arise during the design or manufacturing phases.Conse-quently,these micro-motors might generate anomalous noises during their operation,consequently exerting a substantial adverse influence on the overall comfort of drivers and passengers.Automobile micro-motors exhibit a diverse array of structural variations,consequently leading to the manifestation of a multitude of distinctive auditory irregularities.To address the identification of diverse forms of abnormal noise,this research presents a novel approach rooted in the utilization of vibro-acoustic fusion-convolutional neural network(VAF-CNN).This method entails the deployment of distinct network branches,each serving to capture disparate features from the multi-sensor data,all the while considering the auditory perception traits inherent in the human auditory sys-tem.The intermediary layer integrates the concept of adaptive weighting of multi-sensor features,thus affording a calibration mechanism for the features hailing from multiple sensors,thereby enabling a further refinement of features within the branch network.For optimal model efficacy,a feature fusion mechanism is implemented in the concluding layer.To substantiate the efficacy of the proposed approach,this paper initially employs an augmented data methodology inspired by modified SpecAugment,applied to the dataset of abnormal noise sam-ples,encompassing scenarios both with and without in-vehicle interior noise.This serves to mitigate the issue of limited sample availability.Subsequent comparative evaluations are executed,contrasting the performance of the model founded upon single-sensor data against other feature fusion models reliant on multi-sensor data.The experimental results substantiate that the suggested methodology yields heightened recognition accuracy and greater resilience against interference.Moreover,it holds notable practical significance in the engineering domain,as it furnishes valuable support for the targeted management of noise emanating from vehicle micro-motors.
基金supported by the National Natural Science Foundation of China(No.62006135)the Natural Science Foundation of Shandong Province(No.ZR2020QF116)。
文摘With the intensifying aging of the population,the phenomenon of the elderly living alone is also increasing.Therefore,using modern internet of things technology to monitor the daily behavior of the elderly in indoors is a meaningful study.Video-based action recognition tasks are easily affected by object occlusion and weak ambient light,resulting in poor recognition performance.Therefore,this paper proposes an indoor human behavior recognition method based on wireless fidelity(Wi-Fi)perception and video feature fusion by utilizing the ability of Wi-Fi signals to carry environmental information during the propagation process.This paper uses the public WiFi-based activity recognition dataset(WIAR)containing Wi-Fi channel state information and essential action videos,and then extracts video feature vectors and Wi-Fi signal feature vectors in the datasets through the two-stream convolutional neural network and standard statistical algorithms,respectively.Then the two sets of feature vectors are fused,and finally,the action classification and recognition are performed by the support vector machine(SVM).The experiments in this paper contrast experiments between the two-stream network model and the methods in this paper under three different environments.And the accuracy of action recognition after adding Wi-Fi signal feature fusion is improved by 10%on average.
文摘Currently,the use of intelligent systems for the automatic recognition of targets in the fields of defence and military has increased significantly.The primary advantage of these systems is that they do not need human participation in target recognition processes.This paper uses the particle swarm optimization(PSO)algorithm to select the optimal features in the micro-Doppler signature of sonar targets.The microDoppler effect is referred to amplitude/phase modulation on the received signal by rotating parts of a target such as propellers.Since different targets'geometric and physical properties are not the same,their micro-Doppler signature is different.This Inconsistency can be considered a practical issue(especially in the frequency domain)for sonar target recognition.Despite using 128-point fast Fourier transform(FFT)for the feature extraction step,not all extracted features contain helpful information.As a result,PSO selects the most optimum and valuable features.To evaluate the micro-Doppler signature of sonar targets and the effect of feature selection on sonar target recognition,the simplest and most popular machine learning algorithm,k-nearest neighbor(k-NN),is used,which is called k-PSO in this paper because of the use of PSO for feature selection.The parameters measured are the correct recognition rate,reliability rate,and processing time.The simulation results show that k-PSO achieved a 100%correct recognition rate and reliability rate at 19.35 s when using simulated data at a 15 dB signal-tonoise ratio(SNR)angle of 40°.Also,for the experimental dataset obtained from the cavitation tunnel,the correct recognition rate is 98.26%,and the reliability rate is 99.69%at 18.46s.Therefore,the k-PSO has an encouraging performance in automatically recognizing sonar targets when using experimental datasets and for real-world use.
基金supported by“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP),granted financial resources from the Ministry of Trade,Industry&Energy,Republic of Korea.(No.20204010600090).
文摘Human action recognition(HAR)attempts to understand a subject’sbehavior and assign a label to each action performed.It is more appealingbecause it has a wide range of applications in computer vision,such asvideo surveillance and smart cities.Many attempts have been made in theliterature to develop an effective and robust framework for HAR.Still,theprocess remains difficult and may result in reduced accuracy due to severalchallenges,such as similarity among actions,extraction of essential features,and reduction of irrelevant features.In this work,we proposed an end-toendframework using deep learning and an improved tree seed optimizationalgorithm for accurate HAR.The proposed design consists of a fewsignificantsteps.In the first step,frame preprocessing is performed.In the second step,two pre-trained deep learning models are fine-tuned and trained throughdeep transfer learning using preprocessed video frames.In the next step,deeplearning features of both fine-tuned models are fused using a new ParallelStandard Deviation Padding Max Value approach.The fused features arefurther optimized using an improved tree seed algorithm,and select the bestfeatures are finally classified by using the machine learning classifiers.Theexperiment was carried out on five publicly available datasets,including UTInteraction,Weizmann,KTH,Hollywood,and IXAMS,and achieved higheraccuracy than previous techniques.
基金Supported by National Natural Science Foundation of China and Shanxi Coalbased Low Carbon Joint Fund(Grant No.U1910211)National Natural Science Foundation of China(Grant Nos.51975024 and 52105044)National Key Research and Development Project(Grant No.2019YFC0121700).
文摘Shield machines are currently the main tool for underground tunnel construction. Due to the complexity and variability of the underground construction environment, it is necessary to accurately identify the ground in real-time during the tunnel construction process to match and adjust the tunnel parameters according to the geological conditions to ensure construction safety. Compared with the traditional method of stratum identifcation based on staged drilling sampling, the real-time stratum identifcation method based on construction data has the advantages of low cost and high precision. Due to the huge amount of sensor data of the ultra-large diameter mud-water balance shield machine, in order to balance the identifcation time and recognition accuracy of the formation, it is necessary to screen the multivariate data features collected by hundreds of sensors. In response to this problem, this paper proposes a voting-based feature extraction method (VFS), which integrates multiple feature extraction algorithms FSM, and the frequency of each feature in all feature extraction algorithms is the basis for voting. At the same time, in order to verify the wide applicability of the method, several commonly used classifcation models are used to train and test the obtained efective feature data, and the model accuracy and recognition time are used as evaluation indicators, and the classifcation with the best combination with VFS is obtained. The experimental results of shield machine data of 6 diferent geological structures show that the average accuracy of 13 features obtained by VFS combined with diferent classifcation algorithms is 91%;among them, the random forest model takes less time and has the highest recognition accuracy, reaching 93%, showing best compatibility with VFS. Therefore, the VFS algorithm proposed in this paper has high reliability and wide applicability for stratum identifcation in the process of tunnel construction, and can be matched with a variety of classifer algorithms. By combining 13 features selected from shield machine data features with random forest, the identifcation of the construction stratum environment of shield tunnels can be well realized, and further theoretical guidance for underground engineering construction can be provided.
基金supported by“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)granted financial resources from the Ministry of Trade,Industry&Energy,Republic of Korea.(No.20204010600090).
文摘Gait recognition is an active research area that uses a walking theme to identify the subject correctly.Human Gait Recognition(HGR)is performed without any cooperation from the individual.However,in practice,it remains a challenging task under diverse walking sequences due to the covariant factors such as normal walking and walking with wearing a coat.Researchers,over the years,have worked on successfully identifying subjects using different techniques,but there is still room for improvement in accuracy due to these covariant factors.This paper proposes an automated model-free framework for human gait recognition in this article.There are a few critical steps in the proposed method.Firstly,optical flow-based motion region esti-mation and dynamic coordinates-based cropping are performed.The second step involves training a fine-tuned pre-trained MobileNetV2 model on both original and optical flow cropped frames;the training has been conducted using static hyperparameters.The third step proposed a fusion technique known as normal distribution serially fusion.In the fourth step,a better optimization algorithm is applied to select the best features,which are then classified using a Bi-Layered neural network.Three publicly available datasets,CASIA A,CASIA B,and CASIA C,were used in the experimental process and obtained average accuracies of 99.6%,91.6%,and 95.02%,respectively.The proposed framework has achieved improved accuracy compared to the other methods.
基金supported by the GRRC program of Gyeonggi Province(GRRC-Gachon2023(B02),Development of AI-based medical service technology).
文摘The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction tech-niques,specifically through the incorporation of high-resolution Mel-spectrograms and the expedited calculation of Mel Frequency Cepstral Coefficients(MFCC).This initiative aimed to refine the system’s accuracy by identifying and mitigating the shortcomings commonly found in current approaches.Ultimately,the primary objective was to elevate both the intricacy and effectiveness of our SER model,with a focus on augmenting its proficiency in the accurate identification of emotions in spoken language.The research employed a dual-strategy approach for feature extraction.Firstly,a rapid computation technique for MFCC was implemented and integrated with a Bi-LSTM layer to optimize the encoding of MFCC features.Secondly,a pretrained ResNet model was utilized in conjunction with feature Stats pooling and dense layers for the effective encoding of Mel-spectrogram attributes.These two sets of features underwent separate processing before being combined in a Convolutional Neural Network(CNN)outfitted with a dense layer,with the aim of enhancing their representational richness.The model was rigorously evaluated using two prominent databases:CMU-MOSEI and RAVDESS.Notable findings include an accuracy rate of 93.2%on the CMU-MOSEI database and 95.3%on the RAVDESS database.Such exceptional performance underscores the efficacy of this innovative approach,which not only meets but also exceeds the accuracy benchmarks established by traditional models in the field of speech emotion recognition.
基金supported by the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)program(IITP-2022-2020-0-01832)supervised by the IITP(Institute of Information&Communications Technology Planning&Evaluation)and the Soonchunhyang University Research Fund.
文摘Human gait recognition(HGR)is the process of identifying a sub-ject(human)based on their walking pattern.Each subject is a unique walking pattern and cannot be simulated by other subjects.But,gait recognition is not easy and makes the system difficult if any object is carried by a subject,such as a bag or coat.This article proposes an automated architecture based on deep features optimization for HGR.To our knowledge,it is the first architecture in which features are fused using multiset canonical correlation analysis(MCCA).In the proposed method,original video frames are processed for all 11 selected angles of the CASIA B dataset and utilized to train two fine-tuned deep learning models such as Squeezenet and Efficientnet.Deep transfer learning was used to train both fine-tuned models on selected angles,yielding two new targeted models that were later used for feature engineering.Features are extracted from the deep layer of both fine-tuned models and fused into one vector using MCCA.An improved manta ray foraging optimization algorithm is also proposed to select the best features from the fused feature matrix and classified using a narrow neural network classifier.The experimental process was conducted on all 11 angles of the large multi-view gait dataset(CASIA B)dataset and obtained improved accuracy than the state-of-the-art techniques.Moreover,a detailed confidence interval based analysis also shows the effectiveness of the proposed architecture for HGR.
文摘The BlazePose,which models human body skeletons as spatiotem-poral graphs,has achieved fantastic performance in skeleton-based action identification.Skeleton extraction from photos for mobile devices has been made possible by the BlazePose system.A Spatial-Temporal Graph Con-volutional Network(STGCN)can then forecast the actions.The Spatial-Temporal Graph Convolutional Network(STGCN)can be improved by simply replacing the skeleton input data with a different set of joints that provide more information about the activity of interest.On the other hand,existing approaches require the user to manually set the graph’s topology and then fix it across all input layers and samples.This research shows how to use the Statistical Fractal Search(SFS)-Guided whale optimization algorithm(GWOA).To get the best solution for the GWOA,we adopt the SFS diffusion algorithm,which uses the random walk with a Gaussian distribution method common to growing systems.Continuous values are transformed into binary to apply to the feature-selection problem in conjunction with the BlazePose skeletal topology and stochastic fractal search to construct a novel implementation of the BlazePose topology for action recognition.In our experiments,we employed the Kinetics and the NTU-RGB+D datasets.The achieved actiona accuracy in the X-View is 93.14%and in the X-Sub is 96.74%.In addition,the proposed model performs better in numerous statistical tests such as the Analysis of Variance(ANOVA),Wilcoxon signed-rank test,histogram,and times analysis.
基金supported by Fujian Provincial Science and Technology Major Project(No.2020HZ02014)by the grants from National Natural Science Foundation of Fujian(2021J01133,2021J011404)by the Quanzhou Scientific and Technological Planning Projects(Nos.2018C113R,2019C028R,2019C029R,2019C076R and 2019C099R).
文摘Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Cardiology,medical imaging technology(2D ultrasonic,MRI)has been proved to be helpful to detect congenital defects of the fetal heart and assists sonographers in prenatal diagnosis.It is a highly complex task to recognize 2D fetal heart ultrasonic standard plane(FHUSP)manually.Compared withmanual identification,automatic identification through artificial intelligence can save a lot of time,ensure the efficiency of diagnosis,and improve the accuracy of diagnosis.In this study,a feature extraction method based on texture features(Local Binary Pattern LBP and Histogram of Oriented Gradient HOG)and combined with Bag of Words(BOW)model is carried out,and then feature fusion is performed.Finally,it adopts Support VectorMachine(SVM)to realize automatic recognition and classification of FHUSP.The data includes 788 standard plane data sets and 448 normal and abnormal plane data sets.Compared with some other methods and the single method model,the classification accuracy of our model has been obviously improved,with the highest accuracy reaching 87.35%.Similarly,we also verify the performance of the model in normal and abnormal planes,and the average accuracy in classifying abnormal and normal planes is 84.92%.The experimental results show that thismethod can effectively classify and predict different FHUSP and can provide certain assistance for sonographers to diagnose fetal congenital heart disease.
基金supported by the Healthcare AI Convergence R&D Program through the National IT Industry Promotion Agency of Korea(NIPA)funded by the Ministry of Science and ICT(No.S0102-23-1007)the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2017R1A6A1A03015496).
文摘Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image to extract features and then recognize specific emotions through a pre-trained model.In contrast,this paper proposes a novel feature vector extraction method using the Euclidean distance between the landmarks changing their positions according to facial expressions,especially around the eyes,eyebrows,nose,andmouth.Then,we apply a newclassifier using an ensemble network to increase emotion recognition accuracy.The emotion recognition performance was compared with the conventional algorithms using public databases.The results indicated that the proposed method achieved higher accuracy than the traditional based on facial expressions for emotion recognition.In particular,our experiments with the FER2013 database show that our proposed method is robust to lighting conditions and backgrounds,with an average of 25% higher performance than previous studies.Consequently,the proposed method is expected to recognize facial expressions,especially fear and anger,to help prevent severe accidents by detecting security-related or dangerous actions in advance.
基金National Key R&D Program of China(No:2019YFD0901605).
文摘Ear recognition is a new kind of biometric identification technology now.Feature extraction is a key step in pattern recognition technology,which determines the accuracy of classification results.The method of single feature extraction can achieve high recognition rate under certain conditions,but the use of double feature extraction can overcome the limitation of single feature extraction.In order to improve the accuracy of classification results,this paper proposes a new method,that is,the method of complementary double feature extraction based on Principal Component Analysis(PCA)and Fisherface,and we apply it to human ear image recognition.The experiment was carried out on the ear image library provided by the University of Science and Technology Beijing.The results show that the ear recognition rate of the proposed method is significantly higher than the single feature extraction using PCA,Fisherface,or Independent component analysis(ICA)alone.
文摘By combining fractal theory with D-S evidence theory, an algorithm based on the fusion of multi-fractal features is presented. Fractal features are extracted, and basic probability assignment function is designed. Comparison and simulation are performed on the new algorithm, the old algorithm based on single feature and the algorithm based on neural network. Results of the comparison and simulation illustrate that the new algorithm is feasible and valid.