期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Self-attention transfer networks for speech emotion recognition 被引量:4
1
作者 Ziping ZHAO Keru Wang +6 位作者 Zhongtian BAO Zixing ZHANG Nicholas CUMMINS Shihuang SUN Haishuai WANG Jianhua TAO björn wschuller 《Virtual Reality & Intelligent Hardware》 2021年第1期43-54,共12页
Background A crucial element of human-machine interaction,the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models.One vital challenge in s... Background A crucial element of human-machine interaction,the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models.One vital challenge in speech emotion recognition(SER)is learning robust and discriminative representations from speech.Although machine learning methods have been widely applied in SER research,the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e.g.,deep neural networks).To address this issue,we present a deep learning method that combines knowledge transfer and self-attention for SER tasks.Herein,we apply the log-Mel spectrogram with deltas and delta-deltas as inputs.Moreover,given that emotions are time dependent,we apply temporal convolutional neural networks to model the variations in emotions.We further introduce an attention transfer mechanism,which is based on a self-attention algorithm to learn long-term dependencies.The self-attention transfer network(SATN)in our proposed approach takes advantage of attention transfer to learn attention from speech recognition,followed by transferring this knowledge into SER.An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model. 展开更多
关键词 Speech emotion recognition Attention transfer Self-attention Temporal convolutional neural networks(TCNs)
下载PDF
Frustration recognition from speech during game interaction using wide residual networks
2
作者 Meishu SONG Adria MALLOL-RAGOLTA +5 位作者 Emilia PARADA-CABALEIRO Zijiang YANG Shuo LIU Zhao REN Ziping ZHAO björn wschuller 《Virtual Reality & Intelligent Hardware》 2021年第1期76-86,共11页
Background Although frustration is a common emotional reaction while playing games,an excessive level of frustration can negatively impact a user's experience,discouraging them from further game interactions.The a... Background Although frustration is a common emotional reaction while playing games,an excessive level of frustration can negatively impact a user's experience,discouraging them from further game interactions.The automatic detection of frustration can enable the development of adaptive systems that can adapt a game to a user's specific needs through real-time difficulty adjustment,thereby optimizing the player's experience and guaranteeing game success.To this end,we present a speech-based approach for the automatic detection of frustration during game interactions,a specific task that remains under explored in research.Method The experiments were performed on the Multimodal Game Frustration Database(MGFD),an audiovisual dataset-collected within the Wizard-of-Oz framework-that is specially tailored to investigate verbal and facial expressions of frustration during game interactions.We explored the performance of a variety of acoustic feature sets,including Mel-Spectrograms,Mel Frequency Cepstral Coefficients(MFCCs),and the low-dimensional knowledge-based acoustic feature set eGeMAPS.Because of the continual improvements in speech recognition tasks achieved by the use of convolutional neural networks(CNNs),unlike the MGFD baseline,which is based on the Long Short Term Memory(LSTM)architecture and Support Vector Machine(SVM)classifier-in the present work,we consider typical CNNs,including ResNet,VGG,and AlexNet.Furthermore,given the unresolved debate on the suitability of shallow and deep networks,we also examine the performance of two of the latest deep CNNs:WideResNet and EfficientNet.Results Our best result,achieved with WideResNet and Mel-Spectrogram features,increases the system performance from 58.8%unweighted average recall(UAR)to 93.1%UAR for speech-based automatic frustration recognition. 展开更多
关键词 Frustration recognition WideResNets Machine learning
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部