期刊文献+
共找到1,554篇文章
< 1 2 78 >
每页显示 20 50 100
Faster Region Convolutional Neural Network(FRCNN)Based Facial Emotion Recognition
1
作者 J.Sheril Angel A.Diana Andrushia +3 位作者 TMary Neebha Oussama Accouche Louai Saker N.Anand 《Computers, Materials & Continua》 SCIE EI 2024年第5期2427-2448,共22页
Facial emotion recognition(FER)has become a focal point of research due to its widespread applications,ranging from human-computer interaction to affective computing.While traditional FER techniques have relied on han... Facial emotion recognition(FER)has become a focal point of research due to its widespread applications,ranging from human-computer interaction to affective computing.While traditional FER techniques have relied on handcrafted features and classification models trained on image or video datasets,recent strides in artificial intelligence and deep learning(DL)have ushered in more sophisticated approaches.The research aims to develop a FER system using a Faster Region Convolutional Neural Network(FRCNN)and design a specialized FRCNN architecture tailored for facial emotion recognition,leveraging its ability to capture spatial hierarchies within localized regions of facial features.The proposed work enhances the accuracy and efficiency of facial emotion recognition.The proposed work comprises twomajor key components:Inception V3-based feature extraction and FRCNN-based emotion categorization.Extensive experimentation on Kaggle datasets validates the effectiveness of the proposed strategy,showcasing the FRCNN approach’s resilience and accuracy in identifying and categorizing facial expressions.The model’s overall performance metrics are compelling,with an accuracy of 98.4%,precision of 97.2%,and recall of 96.31%.This work introduces a perceptive deep learning-based FER method,contributing to the evolving landscape of emotion recognition technologies.The high accuracy and resilience demonstrated by the FRCNN approach underscore its potential for real-world applications.This research advances the field of FER and presents a compelling case for the practicality and efficacy of deep learning models in automating the understanding of facial emotions. 展开更多
关键词 Facial emotions FRCNN deep learning emotion recognition FACE CNN
下载PDF
Multi-Objective Equilibrium Optimizer for Feature Selection in High-Dimensional English Speech Emotion Recognition
2
作者 Liya Yue Pei Hu +1 位作者 Shu-Chuan Chu Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第2期1957-1975,共19页
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext... Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER. 展开更多
关键词 Speech emotion recognition filter-wrapper HIGH-DIMENSIONAL feature selection equilibrium optimizer MULTI-OBJECTIVE
下载PDF
E2E-MFERC:AMulti-Face Expression Recognition Model for Group Emotion Assessment
3
作者 Lin Wang Juan Zhao +1 位作者 Hu Song Xiaolong Xu 《Computers, Materials & Continua》 SCIE EI 2024年第4期1105-1135,共31页
In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect anal... In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect analysis,thereby continuouslypromotingthe improvementof teaching quality.However,most existingmulti-face expressionrecognition methods adopt a multi-stage approach, with an overall complex process, poor real-time performance,and insufficient generalization ability. In addition, the existing facial expression datasets are mostly single faceimages, which are of low quality and lack specificity, also restricting the development of this research. This paperaims to propose an end-to-end high-performance multi-face expression recognition algorithm model suitable forsmart classrooms, construct a high-quality multi-face expression dataset to support algorithm research, and applythe model to group emotion assessment to expand its application value. To this end, we propose an end-to-endmulti-face expression recognition algorithm model for smart classrooms (E2E-MFERC). In order to provide highqualityand highly targeted data support for model research, we constructed a multi-face expression dataset inreal classrooms (MFED), containing 2,385 images and a total of 18,712 expression labels, collected from smartclassrooms. In constructing E2E-MFERC, by introducing Re-parameterization visual geometry group (RepVGG)block and symmetric positive definite convolution (SPD-Conv) modules to enhance representational capability;combined with the cross stage partial network fusion module optimized by attention mechanism (C2f_Attention),it strengthens the ability to extract key information;adopts asymptotic feature pyramid network (AFPN) featurefusion tailored to classroomscenes and optimizes the head prediction output size;achieves high-performance endto-end multi-face expression detection. Finally, we apply the model to smart classroom group emotion assessmentand provide design references for classroom effect analysis evaluation metrics. Experiments based on MFED showthat the mAP and F1-score of E2E-MFERC on classroom evaluation data reach 83.6% and 0.77, respectively,improving the mAP of same-scale You Only Look Once version 5 (YOLOv5) and You Only Look Once version8 (YOLOv8) by 6.8% and 2.5%, respectively, and the F1-score by 0.06 and 0.04, respectively. E2E-MFERC modelhas obvious advantages in both detection speed and accuracy, which can meet the practical needs of real-timemulti-face expression analysis in classrooms, and serve the application of teaching effect assessment very well. 展开更多
关键词 Multi-face expression recognition smart classroom end-to-end detection group emotion assessment
下载PDF
Multimodal emotion recognition in the metaverse era:New needs and transformation in mental health work
4
作者 Yan Zeng Jun-Wen Zhang Jian Yang 《World Journal of Clinical Cases》 SCIE 2024年第34期6674-6678,共5页
This editorial comments on an article recently published by López del Hoyo et al.The metaverse,hailed as"the successor to the mobile Internet",is undoubtedly one of the most fashionable terms in recent ... This editorial comments on an article recently published by López del Hoyo et al.The metaverse,hailed as"the successor to the mobile Internet",is undoubtedly one of the most fashionable terms in recent years.Although metaverse development is a complex and multifaceted evolutionary process influenced by many factors,it is almost certain that it will significantly impact our lives,including mental health services.Like any other technological advancements,the metaverse era presents a double-edged sword for mental health work,which must clearly understand the needs and transformations of its target audience.In this editorial,our primary focus is to contemplate potential new needs and transformation in mental health work during the metaverse era from the pers-pective of multimodal emotion recognition. 展开更多
关键词 Multimodal emotion recognition Metaverse Needs TRANSFORMATION Mental health
下载PDF
Emotional speaker recognition based on prosody transformation 被引量:1
5
作者 宋鹏 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2011年第4期357-360,共4页
A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech.... A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model (GMM) and support vector regression (SVR) are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate (IR) is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one. 展开更多
关键词 emotion recognition speaker recognition F0 transformation duration modification
下载PDF
Speech emotion recognition using semi-supervised discriminant analysis
6
作者 徐新洲 黄程韦 +2 位作者 金赟 吴尘 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2014年第1期7-12,共6页
Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samp... Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA. 展开更多
关键词 speech emotion recognition speech emotion feature semi-supervised discriminant analysis dimensionality reduction
下载PDF
Novel feature fusion method for speech emotion recognition based on multiple kernel learning
7
作者 金赟 宋鹏 +1 位作者 郑文明 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2013年第2期129-133,共5页
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the gl... In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method. 展开更多
关键词 speech emotion recognition multiple kemellearning feature fusion support vector machine
下载PDF
Dimensional emotion recognition in whispered speech signal based on cognitive performance evaluation
8
作者 吴晨健 黄程韦 陈虹 《Journal of Southeast University(English Edition)》 EI CAS 2015年第3期311-319,共9页
The cognitive performance-based dimensional emotion recognition in whispered speech is studied.First,the whispered speech emotion databases and data collection methods are compared, and the character of emotion expres... The cognitive performance-based dimensional emotion recognition in whispered speech is studied.First,the whispered speech emotion databases and data collection methods are compared, and the character of emotion expression in whispered speech is studied,especially the basic types of emotions.Secondly,the emotion features for whispered speech is analyzed,and by reviewing the latest references,the related valence features and the arousal features are provided. The effectiveness of valence and arousal features in whispered speech emotion classification is studied.Finally,the Gaussian mixture model is studied and applied to whispered speech emotion recognition. The cognitive performance is also considered in emotion recognition so that the recognition errors of whispered speech emotion can be corrected.Based on the cognitive scores,the emotion recognition results can be improved.The results show that the formant features are not significantly related to arousal dimension,while the short-term energy features are related to the emotion changes in arousal dimension.Using the cognitive scores,the recognition results can be improved. 展开更多
关键词 whispered speech emotion recognition emotion dimensional space
下载PDF
A Facial Expression Emotion Recognition Based Human-robot Interaction System 被引量:5
9
作者 Zhentao Liu Min Wu +5 位作者 Weihua Cao Luefeng Chen Jianping Xu Ri Zhang Mengtian Zhou Junwei Mao 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第4期668-676,共9页
A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize huma... A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize human emotions, but also to generate facial expression for adapting to human emotions. A facial emotion recognition method based on2D-Gabor, uniform local binary pattern(LBP) operator, and multiclass extreme learning machine(ELM) classifier is presented,which is applied to real-time facial expression recognition for robots. Facial expressions of robots are represented by simple cartoon symbols and displayed by a LED screen equipped in the robots, which can be easily understood by human. Four scenarios,i.e., guiding, entertainment, home service and scene simulation are performed in the human-robot interaction experiment, in which smooth communication is realized by facial expression recognition of humans and facial expression generation of robots within 2 seconds. As a few prospective applications, the FEERHRI system can be applied in home service, smart home, safe driving, and so on. 展开更多
关键词 emotion generation facial expression emotion recognition(FEer) human-robot interaction(HRI) system design
下载PDF
Single-trial EEG-based emotion recognition using temporally regularized common spatial pattern
10
作者 成敏敏 陆祖宏 王海贤 《Journal of Southeast University(English Edition)》 EI CAS 2015年第1期55-60,共6页
This study addresses the problem of classifying emotional words based on recorded electroencephalogram (EEG) signals by the single-trial EEG classification technique. Emotional two-character Chinese words are used a... This study addresses the problem of classifying emotional words based on recorded electroencephalogram (EEG) signals by the single-trial EEG classification technique. Emotional two-character Chinese words are used as experimental materials. Positive words versus neutral words and negative words versus neutral words are classified, respectively, using the induced EEG signals. The method of temporally regularized common spatial patterns (TRCSP) is chosen to extract features from the EEG trials, and then single-trial EEG classification is achieved by linear discriminant analysis. Classification accuracies are between 55% and 65%. The statistical significance of the classification accuracies is confirmed by permutation tests, which shows the successful identification of emotional words and neutral ones, and also the ability to identify emotional words. In addition, 10 out of 15 subjects obtain significant classification accuracy for negative words versus neutral words while only 4 are significant for positive words versus neutral words, which demonstrate that negative emotions are more easily identified. 展开更多
关键词 emotion recognition temporal regularization common spatial patterns(CSP) two-character Chinese words permutation test
下载PDF
Transformer-like model with linear attention for speech emotion recognition 被引量:4
11
作者 Du Jing Tang Manting Zhao Li 《Journal of Southeast University(English Edition)》 EI CAS 2021年第2期164-170,共7页
Because of the excellent performance of Transformer in sequence learning tasks,such as natural language processing,an improved Transformer-like model is proposed that is suitable for speech emotion recognition tasks.T... Because of the excellent performance of Transformer in sequence learning tasks,such as natural language processing,an improved Transformer-like model is proposed that is suitable for speech emotion recognition tasks.To alleviate the prohibitive time consumption and memory footprint caused by softmax inside the multihead attention unit in Transformer,a new linear self-attention algorithm is proposed.The original exponential function is replaced by a Taylor series expansion formula.On the basis of the associative property of matrix products,the time and space complexity of softmax operation regarding the input's length is reduced from O(N2)to O(N),where N is the sequence length.Experimental results on the emotional corpora of two languages show that the proposed linear attention algorithm can achieve similar performance to the original scaled dot product attention,while the training time and memory cost are reduced by half.Furthermore,the improved model obtains more robust performance on speech emotion recognition compared with the original Transformer. 展开更多
关键词 TRANSFORMer attention mechanism speech emotion recognition fast softmax
下载PDF
Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
12
作者 Fatma Harby Mansor Alohali +1 位作者 Adel Thaljaoui Amira Samy Talaat 《Computers, Materials & Continua》 SCIE EI 2024年第2期2689-2719,共31页
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona... Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field. 展开更多
关键词 Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
下载PDF
Multimodal Emotion Recognition with Transfer Learning of Deep Neural Network 被引量:2
13
作者 HUANG Jian LI Ya +1 位作者 TAO Jianhua YI Jiangyan 《ZTE Communications》 2017年第B12期23-29,共7页
Due to the lack of large-scale emotion databases,it is hard to obtain comparable improvement in multimodal emotion recognition of the deep neural network by deep learning,which has made great progress in other areas.W... Due to the lack of large-scale emotion databases,it is hard to obtain comparable improvement in multimodal emotion recognition of the deep neural network by deep learning,which has made great progress in other areas.We use transfer learning to improve its performance with pretrained models on largescale data.Audio is encoded using deep speech recognition networks with 500 hours’speech and video is encoded using convolutional neural networks with over 110,000 images.The extracted audio and visual features are fed into Long Short-Term Memory to train models respectively.Logistic regression and ensemble method are performed in decision level fusion.The experiment results indicate that 1)audio features extracted from deep speech recognition networks achieve better performance than handcrafted audio features;2)the visual emotion recognition obtains better performance than audio emotion recognition;3)the ensemble method gets better performance than logistic regression and prior knowledge from micro-F1 value further improves the performance and robustness,achieving accuracy of 67.00%for“happy”,54.90%for“an?gry”,and 51.69%for“sad”. 展开更多
关键词 DEEP NEUTRAL network ENSEMBLE method MULTIMODAL emotion recognition TRANSFer learning
下载PDF
Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance 被引量:2
14
作者 Sandeep Kumar MohdAnul Haq +4 位作者 Arpit Jain C.Andy Jason Nageswara Rao Moparthi Nitin Mittal Zamil S.Alzamil 《Computers, Materials & Continua》 SCIE EI 2023年第1期1523-1540,共18页
Day by day,biometric-based systems play a vital role in our daily lives.This paper proposed an intelligent assistant intended to identify emotions via voice message.A biometric system has been developed to detect huma... Day by day,biometric-based systems play a vital role in our daily lives.This paper proposed an intelligent assistant intended to identify emotions via voice message.A biometric system has been developed to detect human emotions based on voice recognition and control a few electronic peripherals for alert actions.This proposed smart assistant aims to provide a support to the people through buzzer and light emitting diodes(LED)alert signals and it also keep track of the places like households,hospitals and remote areas,etc.The proposed approach is able to detect seven emotions:worry,surprise,neutral,sadness,happiness,hate and love.The key elements for the implementation of speech emotion recognition are voice processing,and once the emotion is recognized,the machine interface automatically detects the actions by buzzer and LED.The proposed system is trained and tested on various benchmark datasets,i.e.,Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS)database,Acoustic-Phonetic Continuous Speech Corpus(TIMIT)database,Emotional Speech database(Emo-DB)database and evaluated based on various parameters,i.e.,accuracy,error rate,and time.While comparing with existing technologies,the proposed algorithm gave a better error rate and less time.Error rate and time is decreased by 19.79%,5.13 s.for the RAVDEES dataset,15.77%,0.01 s for the Emo-DB dataset and 14.88%,3.62 for the TIMIT database.The proposed model shows better accuracy of 81.02%for the RAVDEES dataset,84.23%for the TIMIT dataset and 85.12%for the Emo-DB dataset compared to Gaussian Mixture Modeling(GMM)and Support Vector Machine(SVM)Model. 展开更多
关键词 Speech emotion recognition classifier implementation feature extraction and selection smart assistance
下载PDF
A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition 被引量:1
15
作者 Peizhu Gong Jin Liu +3 位作者 Zhongdai Wu Bing Han YKenWang Huihua He 《Computers, Materials & Continua》 SCIE EI 2023年第2期4203-4220,共18页
Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due... Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due to its inclusion of the semantic features of two different modalities,i.e.,audio and text.However,existing methods often fail in effectively represent features and capture correlations.This paper presents a multi-level circulant cross-modal Transformer(MLCCT)formultimodal speech emotion recognition.The proposed model can be divided into three steps,feature extraction,interaction and fusion.Self-supervised embedding models are introduced for feature extraction,which give a more powerful representation of the original data than those using spectrograms or audio features such as Mel-frequency cepstral coefficients(MFCCs)and low-level descriptors(LLDs).In particular,MLCCT contains two types of feature interaction processes,where a bidirectional Long Short-term Memory(Bi-LSTM)with circulant interaction mechanism is proposed for low-level features,while a two-stream residual cross-modal Transformer block is appliedwhen high-level features are involved.Finally,we choose self-attention blocks for fusion and a fully connected layer to make predictions.To evaluate the performance of our proposed model,comprehensive experiments are conducted on three widely used benchmark datasets including IEMOCAP,MELD and CMU-MOSEI.The competitive results verify the effectiveness of our approach. 展开更多
关键词 Speech emotion recognition self-supervised embedding model cross-modal transformer self-attention
下载PDF
Design of Hierarchical Classifier to Improve Speech Emotion Recognition 被引量:1
16
作者 P.Vasuki 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期19-33,共15页
Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influen... Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influenced by the variations in gender,age,the cultural and acoustical background of the speaker.The acoustical resemblance between emotional expressions further increases the complexity of recognition.Many recent research works are concentrated to address these effects individually.Instead of addressing every influencing attribute individually,we would like to design a system,which reduces the effect that arises on any factor.We propose a two-level Hierarchical classifier named Interpreter of responses(IR).Thefirst level of IR has been realized using Support Vector Machine(SVM)and Gaussian Mixer Model(GMM)classifiers.In the second level of IR,a discriminative SVM classifier has been trained and tested with meta information offirst-level classifiers along with the input acoustical feature vector which is used in primary classifiers.To train the system with a corpus of versatile nature,an integrated emotion corpus has been composed using emotion samples of 5 speech corpora,namely;EMO-DB,IITKGP-SESC,SAVEE Corpus,Spanish emotion corpus,CMU's Woogle corpus.The hierarchical classifier has been trained and tested using MFCC and Low-Level Descriptors(LLD).The empirical analysis shows that the proposed classifier outperforms the traditional classifiers.The proposed ensemble design is very generic and can be adapted even when the number and nature of features change.Thefirst-level classifiers GMM or SVM may be replaced with any other learning algorithm. 展开更多
关键词 Speech emotion recognition hierarchical classifier design ENSEMBLE emotion speech corpora
下载PDF
Multi-head attention-based long short-term memory model for speech emotion recognition 被引量:1
17
作者 Zhao Yan Zhao Li +3 位作者 Lu Cheng Li Sunan Tang Chuangao Lian Hailun 《Journal of Southeast University(English Edition)》 EI CAS 2022年第2期103-109,共7页
To fully make use of information from different representation subspaces,a multi-head attention-based long short-term memory(LSTM)model is proposed in this study for speech emotion recognition(SER).The proposed model ... To fully make use of information from different representation subspaces,a multi-head attention-based long short-term memory(LSTM)model is proposed in this study for speech emotion recognition(SER).The proposed model uses frame-level features and takes the temporal information of emotion speech as the input of the LSTM layer.Here,a multi-head time-dimension attention(MHTA)layer was employed to linearly project the output of the LSTM layer into different subspaces for the reduced-dimension context vectors.To provide relative vital information from other dimensions,the output of MHTA,the output of feature-dimension attention,and the last time-step output of LSTM were utilized to form multiple context vectors as the input of the fully connected layer.To improve the performance of multiple vectors,feature-dimension attention was employed for the all-time output of the first LSTM layer.The proposed model was evaluated on the eNTERFACE and GEMEP corpora,respectively.The results indicate that the proposed model outperforms LSTM by 14.6%and 10.5%for eNTERFACE and GEMEP,respectively,proving the effectiveness of the proposed model in SER tasks. 展开更多
关键词 speech emotion recognition long short-term memory(LSTM) multi-head attention mechanism frame-level features self-attention
下载PDF
EMOTIONAL SPEECH RECOGNITION BASED ON SVM WITH GMM SUPERVECTOR 被引量:1
18
作者 Chen Yanxiang Xie Jian 《Journal of Electronics(China)》 2012年第3期339-344,共6页
Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduce... Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduced for emotional speech recognition. Because of the importance of variance in reflecting the distribution of speech, the normalized mean vectors potential to exploit the information from the variance are adopted to form the GMM supervector. Comparative experiments from five aspects are conducted to study their corresponding effect to system performance. The experiment results, which indicate that the influence of number of mixtures is strong as well as influence of duration is weak, provide basis for the train set selection of Universal Background Model (UBM). 展开更多
关键词 emotional speech recognition Support Vector Machines (SVM) Gaussian Mixture Model (GMM) supervector Universal Background Model (USB)
下载PDF
Emotion Recognition Using WT-SVM in Human-Computer Interaction 被引量:2
19
作者 Zequn Wang Rui Jiao Huiping Jiang 《Journal of New Media》 2020年第3期121-130,共10页
With the continuous development of the computer, people's requirements for computers are also getting more and more, so the brain-computer interface system (BCI) has become an essential part of computer research. ... With the continuous development of the computer, people's requirements for computers are also getting more and more, so the brain-computer interface system (BCI) has become an essential part of computer research. Emotion recognition is an important task for the computer to understand social status in BCI. Affective computing (AC) aims to develop the model of emotions and advance the affective intelligence of computers. There are various emotion recognition approaches. The method based on electroencephalogram (EEG) is more reliable because it is higher in accuracy and more objective in evaluation than other external appearance clues such as emotion expression and gesture. In this paper, we use the wavelet transform (WT) to extract three kinds of EEG features in time, and frequency domain, which are sub-band energy, energy ratio and root mean square of wavelet coefficients. They reflect the emotion related to EEG activities well. The average classification accuracy of support vector machine (SVM) can reach 82.87%, which indicates that these three features are very effective in emotion recognition. On the other hand, compared with international affective picture system (IAPs), EEG data collected by Chinese affective picture system (CAPs) stimulation has a higher emotion recognition rate, indicating that there are cultural background differences in emotions. 展开更多
关键词 emotion recognition pattern recognition SVM wavelet transform
下载PDF
基于TCN-Bi-GRU和交叉注意Transformer的多模态情感识别
20
作者 李嘉华 陈景霞 白义民 《陕西科技大学学报》 北大核心 2025年第1期161-168,共8页
多模态语音情感识别是近年来在自然语言处理和机器学习领域备受关注的研究方向之一,不同模态的数据存在异构性和不一致性,将不同模态信息有效地融合起来并学习到高效的表示形式是一个挑战.为此,本文提出了一种新的基于时序信息建模和交... 多模态语音情感识别是近年来在自然语言处理和机器学习领域备受关注的研究方向之一,不同模态的数据存在异构性和不一致性,将不同模态信息有效地融合起来并学习到高效的表示形式是一个挑战.为此,本文提出了一种新的基于时序信息建模和交叉注意力的多模态语音情感识别模型.首先采用时间卷积网络(Time Convolutional Network,TCN)提取语音、文本和视频数据的深层时序特征,使用双向门控递归单元(Bidirectional Gated Recurrent Unit,Bi-GRU)捕捉序列数据的上下文信息,提高模型对序列数据的理解能力.然后基于交叉注意力机制和Transformer构建多模态融合网络,用于挖掘并捕获音频、文本和视觉特征之间交互的情感信息.此外,在训练过程中引入弹性网络正则化(Elastic Net Regularization)防止模型过拟合,最后完成情感识别任务.在IEMOCAP数据集上,针对快乐、悲伤、愤怒和中性四类情感的分类实验中,准确率分别为87.6%、84.1%、87.5%、71.5%,F1值分别为85.1%、84.3%、87.4%、71.4%.加权平均精度为80.75%,未加权平均精度为82.80%.结果表明,所提方法实现了较好的分类性能. 展开更多
关键词 语音识别 多模态情感识别 时间卷积网络 交叉注意力机制 弹性网络
下载PDF
上一页 1 2 78 下一页 到第
使用帮助 返回顶部