摘要
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech.The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform(FFT) spectral entropy,delta FFT spectral entropy,Mel-frequency filter bank(MFB) spectral entropy,and Delta MFB spectral entropy.Spectral-based entropy features are simple.They reflect frequency characteristic and changing characteristic in frequency of speech.We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance.Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results,respectively.These scores are first obtained from a pattern recognition procedure.The pattern recognition phase uses the Gaussian mixture model(GMM).We classify the four emotional states as anger,sadness,happiness and neutrality.The proposed method is evaluated using 45 sentences in each emotion for 30 subjects,15 males and 15 females.Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy,Zero Crossing Rate(ZCR),linear prediction coefficient(LPC),and pitch parameters.We demonstrate the effectiveness of the proposed approach.One of the proposed features,combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods.We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB) spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores. This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively. These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness, happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR), linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
基金
Supported by MIC,Korea under ITRC IITA-2009-(C1090-0902-0046)
the Korea Science and Engineering Foundation(KOSEF) funded by the Korea government(MEST)(Grant No.20090058909)