摘要
Speech emotion is divided into four categories, Fear, Happy, Neutral and Surprise in this paper. Traditional features and their statistics are generally applied to recognize speech emotion. In order to quantify each feature’s contribution to emotion recogni-tion, a method based on the Back Propagation (BP) neural network is adopted. Then we can obtain the optimal subset of the features. What’s more, two new characteristics of speech emotion, MFCC feature extracted from the fundamental frequency curve (MFCCF0) and amplitude perturbation parameters extracted from the short- time av-erage magnitude curve (APSAM), are added to the selected features. With the Gaus-sian Mixture Model (GMM), we get the highest average recognition rate of the four emotions 82.25%, and the recognition rate of Neutral 90%.
Speech emotion is divided into four categories, Fear, Happy, Neutral and Surprise in this paper. Traditional features and their statistics are generally applied to recognize speech emotion. In order to quantify each feature’s contribution to emotion recogni-tion, a method based on the Back Propagation (BP) neural network is adopted. Then we can obtain the optimal subset of the features. What’s more, two new characteristics of speech emotion, MFCC feature extracted from the fundamental frequency curve (MFCCF0) and amplitude perturbation parameters extracted from the short- time av-erage magnitude curve (APSAM), are added to the selected features. With the Gaus-sian Mixture Model (GMM), we get the highest average recognition rate of the four emotions 82.25%, and the recognition rate of Neutral 90%.
作者
Chunxia Yu
Ling Xie
Weiping Hu
Chunxia Yu;Ling Xie;Weiping Hu(GuangXi Key Lab of Multi-Source Information Mining and Security, GuangXi Normal University, Guilin, China)