期刊文献+
共找到1,499篇文章
< 1 2 75 >
每页显示 20 50 100
Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset,Methodology and Evaluation
1
作者 Shiwen Song Rui Zhang +1 位作者 Min Hu Feiyao Huang 《Computers, Materials & Continua》 SCIE EI 2024年第6期5243-5271,共29页
Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi... Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios. 展开更多
关键词 Multi-modality dataset ship recognition fine-grained recognition attention mechanism
下载PDF
Faster Region Convolutional Neural Network(FRCNN)Based Facial Emotion Recognition
2
作者 J.Sheril Angel A.Diana Andrushia +3 位作者 TMary Neebha Oussama Accouche Louai Saker N.Anand 《Computers, Materials & Continua》 SCIE EI 2024年第5期2427-2448,共22页
Facial emotion recognition(FER)has become a focal point of research due to its widespread applications,ranging from human-computer interaction to affective computing.While traditional FER techniques have relied on han... Facial emotion recognition(FER)has become a focal point of research due to its widespread applications,ranging from human-computer interaction to affective computing.While traditional FER techniques have relied on handcrafted features and classification models trained on image or video datasets,recent strides in artificial intelligence and deep learning(DL)have ushered in more sophisticated approaches.The research aims to develop a FER system using a Faster Region Convolutional Neural Network(FRCNN)and design a specialized FRCNN architecture tailored for facial emotion recognition,leveraging its ability to capture spatial hierarchies within localized regions of facial features.The proposed work enhances the accuracy and efficiency of facial emotion recognition.The proposed work comprises twomajor key components:Inception V3-based feature extraction and FRCNN-based emotion categorization.Extensive experimentation on Kaggle datasets validates the effectiveness of the proposed strategy,showcasing the FRCNN approach’s resilience and accuracy in identifying and categorizing facial expressions.The model’s overall performance metrics are compelling,with an accuracy of 98.4%,precision of 97.2%,and recall of 96.31%.This work introduces a perceptive deep learning-based FER method,contributing to the evolving landscape of emotion recognition technologies.The high accuracy and resilience demonstrated by the FRCNN approach underscore its potential for real-world applications.This research advances the field of FER and presents a compelling case for the practicality and efficacy of deep learning models in automating the understanding of facial emotions. 展开更多
关键词 Facial emotions FRCNN deep learning emotion recognition FACE CNN
下载PDF
Multi-Objective Equilibrium Optimizer for Feature Selection in High-Dimensional English Speech Emotion Recognition
3
作者 Liya Yue Pei Hu +1 位作者 Shu-Chuan Chu Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第2期1957-1975,共19页
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext... Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER. 展开更多
关键词 Speech emotion recognition filter-wrapper HIGH-DIMENSIONAL feature selection equilibrium optimizer MULTI-OBJECTIVE
下载PDF
E2E-MFERC:AMulti-Face Expression Recognition Model for Group Emotion Assessment
4
作者 Lin Wang Juan Zhao +1 位作者 Hu Song Xiaolong Xu 《Computers, Materials & Continua》 SCIE EI 2024年第4期1105-1135,共31页
In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect anal... In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect analysis,thereby continuouslypromotingthe improvementof teaching quality.However,most existingmulti-face expressionrecognition methods adopt a multi-stage approach, with an overall complex process, poor real-time performance,and insufficient generalization ability. In addition, the existing facial expression datasets are mostly single faceimages, which are of low quality and lack specificity, also restricting the development of this research. This paperaims to propose an end-to-end high-performance multi-face expression recognition algorithm model suitable forsmart classrooms, construct a high-quality multi-face expression dataset to support algorithm research, and applythe model to group emotion assessment to expand its application value. To this end, we propose an end-to-endmulti-face expression recognition algorithm model for smart classrooms (E2E-MFERC). In order to provide highqualityand highly targeted data support for model research, we constructed a multi-face expression dataset inreal classrooms (MFED), containing 2,385 images and a total of 18,712 expression labels, collected from smartclassrooms. In constructing E2E-MFERC, by introducing Re-parameterization visual geometry group (RepVGG)block and symmetric positive definite convolution (SPD-Conv) modules to enhance representational capability;combined with the cross stage partial network fusion module optimized by attention mechanism (C2f_Attention),it strengthens the ability to extract key information;adopts asymptotic feature pyramid network (AFPN) featurefusion tailored to classroomscenes and optimizes the head prediction output size;achieves high-performance endto-end multi-face expression detection. Finally, we apply the model to smart classroom group emotion assessmentand provide design references for classroom effect analysis evaluation metrics. Experiments based on MFED showthat the mAP and F1-score of E2E-MFERC on classroom evaluation data reach 83.6% and 0.77, respectively,improving the mAP of same-scale You Only Look Once version 5 (YOLOv5) and You Only Look Once version8 (YOLOv8) by 6.8% and 2.5%, respectively, and the F1-score by 0.06 and 0.04, respectively. E2E-MFERC modelhas obvious advantages in both detection speed and accuracy, which can meet the practical needs of real-timemulti-face expression analysis in classrooms, and serve the application of teaching effect assessment very well. 展开更多
关键词 Multi-face expression recognition smart classroom end-to-end detection group emotion assessment
下载PDF
Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
5
作者 Fatma Harby Mansor Alohali +1 位作者 Adel Thaljaoui Amira Samy Talaat 《Computers, Materials & Continua》 SCIE EI 2024年第2期2689-2719,共31页
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona... Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field. 展开更多
关键词 Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
下载PDF
Multimodal emotion recognition in the metaverse era:New needs and transformation in mental health work
6
作者 Yan Zeng Jun-Wen Zhang Jian Yang 《World Journal of Clinical Cases》 SCIE 2024年第34期6674-6678,共5页
This editorial comments on an article recently published by López del Hoyo et al.The metaverse,hailed as"the successor to the mobile Internet",is undoubtedly one of the most fashionable terms in recent ... This editorial comments on an article recently published by López del Hoyo et al.The metaverse,hailed as"the successor to the mobile Internet",is undoubtedly one of the most fashionable terms in recent years.Although metaverse development is a complex and multifaceted evolutionary process influenced by many factors,it is almost certain that it will significantly impact our lives,including mental health services.Like any other technological advancements,the metaverse era presents a double-edged sword for mental health work,which must clearly understand the needs and transformations of its target audience.In this editorial,our primary focus is to contemplate potential new needs and transformation in mental health work during the metaverse era from the pers-pective of multimodal emotion recognition. 展开更多
关键词 Multimodal emotion recognition Metaverse Needs TRANSFORMATION Mental health
下载PDF
Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance 被引量:2
7
作者 Sandeep Kumar MohdAnul Haq +4 位作者 Arpit Jain C.Andy Jason Nageswara Rao Moparthi Nitin Mittal Zamil S.Alzamil 《Computers, Materials & Continua》 SCIE EI 2023年第1期1523-1540,共18页
Day by day,biometric-based systems play a vital role in our daily lives.This paper proposed an intelligent assistant intended to identify emotions via voice message.A biometric system has been developed to detect huma... Day by day,biometric-based systems play a vital role in our daily lives.This paper proposed an intelligent assistant intended to identify emotions via voice message.A biometric system has been developed to detect human emotions based on voice recognition and control a few electronic peripherals for alert actions.This proposed smart assistant aims to provide a support to the people through buzzer and light emitting diodes(LED)alert signals and it also keep track of the places like households,hospitals and remote areas,etc.The proposed approach is able to detect seven emotions:worry,surprise,neutral,sadness,happiness,hate and love.The key elements for the implementation of speech emotion recognition are voice processing,and once the emotion is recognized,the machine interface automatically detects the actions by buzzer and LED.The proposed system is trained and tested on various benchmark datasets,i.e.,Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS)database,Acoustic-Phonetic Continuous Speech Corpus(TIMIT)database,Emotional Speech database(Emo-DB)database and evaluated based on various parameters,i.e.,accuracy,error rate,and time.While comparing with existing technologies,the proposed algorithm gave a better error rate and less time.Error rate and time is decreased by 19.79%,5.13 s.for the RAVDEES dataset,15.77%,0.01 s for the Emo-DB dataset and 14.88%,3.62 for the TIMIT database.The proposed model shows better accuracy of 81.02%for the RAVDEES dataset,84.23%for the TIMIT dataset and 85.12%for the Emo-DB dataset compared to Gaussian Mixture Modeling(GMM)and Support Vector Machine(SVM)Model. 展开更多
关键词 Speech emotion recognition classifier implementation feature extraction and selection smart assistance
下载PDF
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition 被引量:1
8
作者 Mohammad Amaz Uddin Mohammad Salah Uddin Chowdury +2 位作者 Mayeen Uddin Khandaker Nissren Tamam Abdelmoneim Sulieman 《Computers, Materials & Continua》 SCIE EI 2023年第1期1709-1722,共14页
Human speech indirectly represents the mental state or emotion of others.The use of Artificial Intelligence(AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech.In this study... Human speech indirectly represents the mental state or emotion of others.The use of Artificial Intelligence(AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech.In this study,we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory(LSTM)and Convolutional Neural Network(CNN).About 2800 audio files were extracted from the Toronto emotional speech set(TESS)database for this study.A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data.A total of seven types of emotions;Angry,Disgust,Fear,Happy,Neutral,Pleasant-surprise,and Sad were used in this study.Energy,Fundamental frequency,and Mel Frequency Cepstral Coefficient(MFCC)have been used to extract the emotion features,and these features resulted in 97.5%accuracy in the mixed LSTM+CNN model.This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech.It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing. 展开更多
关键词 emotion recognition Savitzky Golay fundamental frequency MFCC neural networks
下载PDF
A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition 被引量:1
9
作者 Peizhu Gong Jin Liu +3 位作者 Zhongdai Wu Bing Han YKenWang Huihua He 《Computers, Materials & Continua》 SCIE EI 2023年第2期4203-4220,共18页
Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due... Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due to its inclusion of the semantic features of two different modalities,i.e.,audio and text.However,existing methods often fail in effectively represent features and capture correlations.This paper presents a multi-level circulant cross-modal Transformer(MLCCT)formultimodal speech emotion recognition.The proposed model can be divided into three steps,feature extraction,interaction and fusion.Self-supervised embedding models are introduced for feature extraction,which give a more powerful representation of the original data than those using spectrograms or audio features such as Mel-frequency cepstral coefficients(MFCCs)and low-level descriptors(LLDs).In particular,MLCCT contains two types of feature interaction processes,where a bidirectional Long Short-term Memory(Bi-LSTM)with circulant interaction mechanism is proposed for low-level features,while a two-stream residual cross-modal Transformer block is appliedwhen high-level features are involved.Finally,we choose self-attention blocks for fusion and a fully connected layer to make predictions.To evaluate the performance of our proposed model,comprehensive experiments are conducted on three widely used benchmark datasets including IEMOCAP,MELD and CMU-MOSEI.The competitive results verify the effectiveness of our approach. 展开更多
关键词 Speech emotion recognition self-supervised embedding model cross-modal transformer self-attention
下载PDF
Design of Hierarchical Classifier to Improve Speech Emotion Recognition 被引量:1
10
作者 P.Vasuki 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期19-33,共15页
Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influen... Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influenced by the variations in gender,age,the cultural and acoustical background of the speaker.The acoustical resemblance between emotional expressions further increases the complexity of recognition.Many recent research works are concentrated to address these effects individually.Instead of addressing every influencing attribute individually,we would like to design a system,which reduces the effect that arises on any factor.We propose a two-level Hierarchical classifier named Interpreter of responses(IR).Thefirst level of IR has been realized using Support Vector Machine(SVM)and Gaussian Mixer Model(GMM)classifiers.In the second level of IR,a discriminative SVM classifier has been trained and tested with meta information offirst-level classifiers along with the input acoustical feature vector which is used in primary classifiers.To train the system with a corpus of versatile nature,an integrated emotion corpus has been composed using emotion samples of 5 speech corpora,namely;EMO-DB,IITKGP-SESC,SAVEE Corpus,Spanish emotion corpus,CMU's Woogle corpus.The hierarchical classifier has been trained and tested using MFCC and Low-Level Descriptors(LLD).The empirical analysis shows that the proposed classifier outperforms the traditional classifiers.The proposed ensemble design is very generic and can be adapted even when the number and nature of features change.Thefirst-level classifiers GMM or SVM may be replaced with any other learning algorithm. 展开更多
关键词 Speech emotion recognition hierarchical classifier design ENSEMBLE emotion speech corpora
下载PDF
Fine-Grained Action Recognition Based on Temporal Pyramid Excitation Network 被引量:1
11
作者 Xuan Zhou Jianping Yi 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期2103-2116,共14页
Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal windo... Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal window to learn local video representation.However,these methods failed to capture complex motion patterns due to their limited receptive field.To solve the above problems,this paper proposes a lightweight Temporal Pyramid Excitation(TPE)module to capture the short,medium,and long-term temporal context.In this method,Temporal Pyramid(TP)module can effectively expand the temporal receptive field of the network by using the multi-temporal kernel decomposition without significantly increasing the computational cost.In addition,the Multi Excitation module can emphasize temporal importance to enhance the temporal feature representation learning.TPE can be integrated into ResNet50,and building a compact video learning framework-TPENet.Extensive validation experiments on several challenging benchmark(Something-Something V1,Something-Something V2,UCF-101,and HMDB51)datasets demonstrate that our method achieves a preferable balance between computation and accuracy. 展开更多
关键词 fine-grained action recognition temporal pyramid excitation module temporal receptive multi-excitation module
下载PDF
Improved Speech Emotion Recognition Focusing on High-Level Data Representations and Swift Feature Extraction Calculation
12
作者 Akmalbek Abdusalomov Alpamis Kutlimuratov +1 位作者 Rashid Nasimov Taeg Keun Whangbo 《Computers, Materials & Continua》 SCIE EI 2023年第12期2915-2933,共19页
The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction... The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction tech-niques,specifically through the incorporation of high-resolution Mel-spectrograms and the expedited calculation of Mel Frequency Cepstral Coefficients(MFCC).This initiative aimed to refine the system’s accuracy by identifying and mitigating the shortcomings commonly found in current approaches.Ultimately,the primary objective was to elevate both the intricacy and effectiveness of our SER model,with a focus on augmenting its proficiency in the accurate identification of emotions in spoken language.The research employed a dual-strategy approach for feature extraction.Firstly,a rapid computation technique for MFCC was implemented and integrated with a Bi-LSTM layer to optimize the encoding of MFCC features.Secondly,a pretrained ResNet model was utilized in conjunction with feature Stats pooling and dense layers for the effective encoding of Mel-spectrogram attributes.These two sets of features underwent separate processing before being combined in a Convolutional Neural Network(CNN)outfitted with a dense layer,with the aim of enhancing their representational richness.The model was rigorously evaluated using two prominent databases:CMU-MOSEI and RAVDESS.Notable findings include an accuracy rate of 93.2%on the CMU-MOSEI database and 95.3%on the RAVDESS database.Such exceptional performance underscores the efficacy of this innovative approach,which not only meets but also exceeds the accuracy benchmarks established by traditional models in the field of speech emotion recognition. 展开更多
关键词 Feature extraction MFCC ResNet speech emotion recognition
下载PDF
Text Augmentation-Based Model for Emotion Recognition Using Transformers
13
作者 Fida Mohammad Mukhtaj Khan +4 位作者 Safdar Nawaz Khan Marwat Naveed Jan Neelam Gohar Muhammad Bilal Amal Al-Rasheed 《Computers, Materials & Continua》 SCIE EI 2023年第9期3523-3547,共25页
Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their... Emotion Recognition in Conversations(ERC)is fundamental in creating emotionally intelligentmachines.Graph-BasedNetwork(GBN)models have gained popularity in detecting conversational contexts for ERC tasks.However,their limited ability to collect and acquire contextual information hinders their effectiveness.We propose a Text Augmentation-based computational model for recognizing emotions using transformers(TA-MERT)to address this.The proposed model uses the Multimodal Emotion Lines Dataset(MELD),which ensures a balanced representation for recognizing human emotions.Themodel used text augmentation techniques to producemore training data,improving the proposed model’s accuracy.Transformer encoders train the deep neural network(DNN)model,especially Bidirectional Encoder(BE)representations that capture both forward and backward contextual information.This integration improves the accuracy and robustness of the proposed model.Furthermore,we present a method for balancing the training dataset by creating enhanced samples from the original dataset.By balancing the dataset across all emotion categories,we can lessen the adverse effects of data imbalance on the accuracy of the proposed model.Experimental results on the MELD dataset show that TA-MERT outperforms earlier methods,achieving a weighted F1 score of 62.60%and an accuracy of 64.36%.Overall,the proposed TA-MERT model solves the GBN models’weaknesses in obtaining contextual data for ERC.TA-MERT model recognizes human emotions more accurately by employing text augmentation and transformer-based encoding.The balanced dataset and the additional training samples also enhance its resilience.These findings highlight the significance of transformer-based approaches for special emotion recognition in conversations. 展开更多
关键词 emotion recognition in conversation graph-based network text augmentation-basedmodel multimodal emotion lines dataset bidirectional encoder representation for transformer
下载PDF
A Multi-Modal Deep Learning Approach for Emotion Recognition
14
作者 H.M.Shahzad Sohail Masood Bhatti +1 位作者 Arfan Jaffar Muhammad Rashid 《Intelligent Automation & Soft Computing》 SCIE 2023年第5期1561-1570,共10页
In recent years,research on facial expression recognition(FER)under mask is trending.Wearing a mask for protection from Covid 19 has become a compulsion and it hides the facial expressions that is why FER under the ma... In recent years,research on facial expression recognition(FER)under mask is trending.Wearing a mask for protection from Covid 19 has become a compulsion and it hides the facial expressions that is why FER under the mask is a difficult task.The prevailing unimodal techniques for facial recognition are not up to the mark in terms of good results for the masked face,however,a multi-modal technique can be employed to generate better results.We proposed a multi-modal methodology based on deep learning for facial recognition under a masked face using facial and vocal expressions.The multimodal has been trained on a facial and vocal dataset.We have used two standard datasets,M-LFW for the masked dataset and CREMA-D and TESS dataset for vocal expressions.The vocal expressions are in the form of audio while the faces data is in image form that is why the data is heterogenous.In order to make the data homogeneous,the voice data is converted into images by taking spectrogram.A spectrogram embeds important features of the voice and it converts the audio format into the images.Later,the dataset is passed to the multimodal for training.neural network and the experimental results demonstrate that the proposed multimodal algorithm outsets unimodal methods and other state-of-the-art deep neural network models. 展开更多
关键词 Deep learning facial expression recognition multi-model neural network speech emotion recognition SPECTROGRAM covid-19
下载PDF
Using Speaker-Specific Emotion Representations in Wav2vec 2.0-Based Modules for Speech Emotion Recognition
15
作者 Somin Park Mpabulungi Mark +1 位作者 Bogyung Park Hyunki Hong 《Computers, Materials & Continua》 SCIE EI 2023年第10期1009-1030,共22页
Speech emotion recognition is essential for frictionless human-machine interaction,where machines respond to human instructions with context-aware actions.The properties of individuals’voices vary with culture,langua... Speech emotion recognition is essential for frictionless human-machine interaction,where machines respond to human instructions with context-aware actions.The properties of individuals’voices vary with culture,language,gender,and personality.These variations in speaker-specific properties may hamper the performance of standard representations in downstream tasks such as speech emotion recognition(SER).This study demonstrates the significance of speaker-specific speech characteristics and how considering them can be leveraged to improve the performance of SER models.In the proposed approach,two wav2vec-based modules(a speaker-identification network and an emotion classification network)are trained with the Arcface loss.The speaker-identification network has a single attention block to encode an input audio waveform into a speaker-specific representation.The emotion classification network uses a wav2vec 2.0-backbone as well as four attention blocks to encode the same input audio waveform into an emotion representation.These two representations are then fused into a single vector representation containing emotion and speaker-specific information.Experimental results showed that the use of speaker-specific characteristics improves SER performance.Additionally,combining these with an angular marginal loss such as the Arcface loss improves intra-class compactness while increasing inter-class separability,as demonstrated by the plots of t-distributed stochastic neighbor embeddings(t-SNE).The proposed approach outperforms previous methods using similar training strategies,with a weighted accuracy(WA)of 72.14%and unweighted accuracy(UA)of 72.97%on the Interactive Emotional Dynamic Motion Capture(IEMOCAP)dataset.This demonstrates its effectiveness and potential to enhance human-machine interaction through more accurate emotion recognition in speech. 展开更多
关键词 Attention block IEMOCAP dataset speaker-specific representation speech emotion recognition wav2vec 2.0
下载PDF
A Method of Multimodal Emotion Recognition in Video Learning Based on Knowledge Enhancement
16
作者 Hanmin Ye Yinghui Zhou Xiaomei Tao 《Computer Systems Science & Engineering》 SCIE EI 2023年第11期1709-1732,共24页
With the popularity of online learning and due to the significant influence of emotion on the learning effect,more and more researches focus on emotion recognition in online learning.Most of the current research uses ... With the popularity of online learning and due to the significant influence of emotion on the learning effect,more and more researches focus on emotion recognition in online learning.Most of the current research uses the comments of the learning platform or the learner’s expression for emotion recognition.The research data on other modalities are scarce.Most of the studies also ignore the impact of instructional videos on learners and the guidance of knowledge on data.Because of the need for other modal research data,we construct a synchronous multimodal data set for analyzing learners’emotional states in online learning scenarios.The data set recorded the eye movement data and photoplethysmography(PPG)signals of 68 subjects and the instructional video they watched.For the problem of ignoring the instructional videos on learners and ignoring the knowledge,a multimodal emotion recognition method in video learning based on knowledge enhancement is proposed.This method uses the knowledge-based features extracted from instructional videos,such as brightness,hue,saturation,the videos’clickthrough rate,and emotion generation time,to guide the emotion recognition process of physiological signals.This method uses Convolutional Neural Networks(CNN)and Long Short-Term Memory(LSTM)networks to extract deeper emotional representation and spatiotemporal information from shallow features.The model uses multi-head attention(MHA)mechanism to obtain critical information in the extracted deep features.Then,Temporal Convolutional Network(TCN)is used to learn the information in the deep features and knowledge-based features.Knowledge-based features are used to supplement and enhance the deep features of physiological signals.Finally,the fully connected layer is used for emotion recognition,and the recognition accuracy reaches 97.51%.Compared with two recent researches,the accuracy improved by 8.57%and 2.11%,respectively.On the four public data sets,our proposed method also achieves better results compared with the two recent researches.The experiment results show that the proposed multimodal emotion recognition method based on knowledge enhancement has good performance and robustness. 展开更多
关键词 emotion recognition video learning physiological signal knowledge enhancement deep learning CNN LSTM TCN
下载PDF
Performance Analysis of a Chunk-Based Speech Emotion Recognition Model Using RNN
17
作者 Hyun-Sam Shin Jun-Ki Hong 《Intelligent Automation & Soft Computing》 SCIE 2023年第4期235-248,共14页
Recently,artificial-intelligence-based automatic customer response sys-tem has been widely used instead of customer service representatives.Therefore,it is important for automatic customer service to promptly recognize... Recently,artificial-intelligence-based automatic customer response sys-tem has been widely used instead of customer service representatives.Therefore,it is important for automatic customer service to promptly recognize emotions in a customer’s voice to provide the appropriate service accordingly.Therefore,we analyzed the performance of the emotion recognition(ER)accuracy as a function of the simulation time using the proposed chunk-based speech ER(CSER)model.The proposed CSER model divides voice signals into 3-s long chunks to effi-ciently recognize characteristically inherent emotions in the customer’s voice.We evaluated the performance of the ER of voice signal chunks by applying four RNN techniques—long short-term memory(LSTM),bidirectional-LSTM,gated recurrent units(GRU),and bidirectional-GRU—to the proposed CSER model individually to assess its ER accuracy and time efficiency.The results reveal that GRU shows the best time efficiency in recognizing emotions from speech signals in terms of accuracy as a function of simulation time. 展开更多
关键词 RNN speech emotion recognition attention mechanism time efficiency
下载PDF
TC-Net:A Modest&Lightweight Emotion Recognition System Using Temporal Convolution Network
18
作者 Muhammad Ishaq Mustaqeem Khan Soonil Kwon 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3355-3369,共15页
Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluatio... Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluation,which is applicable in many real-world applications such as healthcare,call centers,robotics,safety,and virtual reality.This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker’s emotional state.The authors designed a Temporal Convolutional Network(TCN)core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification.The proposed network extracts valid sequential cues automatically from speech signals,which performed better than state-of-the-art(SOTA)and traditional machine learning algorithms.Results of the proposed method show a high recognition rate compared with SOTAmethods.The final unweighted accuracy of 80.84%,and 92.31%,for interactive emotional dyadic motion captures(IEMOCAP)and berlin emotional dataset(EMO-DB),indicate the robustness and efficiency of the designed model. 展开更多
关键词 Affective computing deep learning emotion recognition speech signal temporal convolutional network
下载PDF
Music Emotion Recognition Based on Feature Fusion Broad Learning Method
19
作者 郁进明 张晨光 海涵 《Journal of Donghua University(English Edition)》 CAS 2023年第3期343-350,共8页
With the rapid development in the field of artificial intelligence and natural language processing(NLP),research on music retrieval has gained importance.Music messages express emotional signals.The emotional classifi... With the rapid development in the field of artificial intelligence and natural language processing(NLP),research on music retrieval has gained importance.Music messages express emotional signals.The emotional classification of music can help in conveniently organizing and retrieving music.It is also the premise of using music for psychological intervention and physiological adjustment.A new chord-to-vector method was proposed,which converted the chord information of music into a chord vector of music and combined the weight of the Mel-frequency cepstral coefficient(MFCC) and residual phase(RP) with the feature fusion of a cochleogram.The music emotion recognition and classification training was carried out using the fusion of a convolution neural network and bidirectional long short-term memory(BiLSTM).In addition,based on the self-collected dataset,a comparison of the proposed model with other model structures was performed.The results show that the proposed method achieved a higher recognition accuracy compared with other models. 展开更多
关键词 music emotion recognition broad learning residual phase(RP) deep learning
下载PDF
Deep Facial Emotion Recognition Using Local Features Based on Facial Landmarks for Security System
20
作者 Youngeun An Jimin Lee +1 位作者 Eunsang Bak Sungbum Pan 《Computers, Materials & Continua》 SCIE EI 2023年第8期1817-1832,共16页
Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image t... Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image to extract features and then recognize specific emotions through a pre-trained model.In contrast,this paper proposes a novel feature vector extraction method using the Euclidean distance between the landmarks changing their positions according to facial expressions,especially around the eyes,eyebrows,nose,andmouth.Then,we apply a newclassifier using an ensemble network to increase emotion recognition accuracy.The emotion recognition performance was compared with the conventional algorithms using public databases.The results indicated that the proposed method achieved higher accuracy than the traditional based on facial expressions for emotion recognition.In particular,our experiments with the FER2013 database show that our proposed method is robust to lighting conditions and backgrounds,with an average of 25% higher performance than previous studies.Consequently,the proposed method is expected to recognize facial expressions,especially fear and anger,to help prevent severe accidents by detecting security-related or dangerous actions in advance. 展开更多
关键词 Facial emotion recognition landmark-based feature extraction ensemble network robustness to the changes in illumination and background dangerous situation detection accident prevention
下载PDF
上一页 1 2 75 下一页 到第
使用帮助 返回顶部