In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the gl...In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method.展开更多
In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amou...In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder,so that the encoder can learn the low-dimensional structural representation of the target domain data.Then,the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain.Finally,a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier.This part of the target domain data is used to guide the source domain data.Experiments on the CASIA,SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2%and 72.4%on the DNN.Compared to the training results of the complete original corpus,it only decreased by 2%in the CASIA corpus,and only 3.4%in the SoutheastLab corpus.Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.展开更多
Ambiguous expression is a common phenomenon in facial expression recognition(FER).Because of the existence of ambiguous expression,the effect of FER is severely limited.The reason maybe that the single label of the da...Ambiguous expression is a common phenomenon in facial expression recognition(FER).Because of the existence of ambiguous expression,the effect of FER is severely limited.The reason maybe that the single label of the data cannot effectively describe complex emotional intentions which are vital in FER.Label distribution learning contains more information and is a possible way to solve this problem.To apply label distribution learning on FER,a label distribution expression recognition algorithm based on asymptotic truth value is proposed.Under the premise of not incorporating extraneous quantitative information,the original information of database is fully used to complete the generation and utilization of label distribution.Firstly,in training part,single label learning is used to collect the mean value of the overall distribution of data.Then,the true value of data label is approached gradually on the granularity of data batch.Finally,the whole network model is retrained using the generated label distribution data.Experimental results show that this method can improve the accuracy of the network model obviously,and has certain competitiveness compared with the advanced algorithms.展开更多
基金The National Natural Science Foundation of China(No.61231002,61273266)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)
文摘In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method.
基金The National Natural Science Foundation of China(No.61871213,61673108,61571106)Six Talent Peaks Project in Jiangsu Province(No.2016-DZXX-023)
文摘In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder,so that the encoder can learn the low-dimensional structural representation of the target domain data.Then,the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain.Finally,a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier.This part of the target domain data is used to guide the source domain data.Experiments on the CASIA,SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2%and 72.4%on the DNN.Compared to the training results of the complete original corpus,it only decreased by 2%in the CASIA corpus,and only 3.4%in the SoutheastLab corpus.Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.
基金National Youth Natural Science Foundation of China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)Project Supported by Jiangsu University Superior Discipline Construction Project。
文摘Ambiguous expression is a common phenomenon in facial expression recognition(FER).Because of the existence of ambiguous expression,the effect of FER is severely limited.The reason maybe that the single label of the data cannot effectively describe complex emotional intentions which are vital in FER.Label distribution learning contains more information and is a possible way to solve this problem.To apply label distribution learning on FER,a label distribution expression recognition algorithm based on asymptotic truth value is proposed.Under the premise of not incorporating extraneous quantitative information,the original information of database is fully used to complete the generation and utilization of label distribution.Firstly,in training part,single label learning is used to collect the mean value of the overall distribution of data.Then,the true value of data label is approached gradually on the granularity of data batch.Finally,the whole network model is retrained using the generated label distribution data.Experimental results show that this method can improve the accuracy of the network model obviously,and has certain competitiveness compared with the advanced algorithms.