表情图像是基于表情识别技术分析学习情感的基础,然而现有表情库样本数量有限、表情图像中仅包含单人表情、拍摄场景为实验室环境,这些局限性无法支持学习情感分析的深入研究。为了解决这些问题,文章搭建了北京师范大学学习情感数据库(B...表情图像是基于表情识别技术分析学习情感的基础,然而现有表情库样本数量有限、表情图像中仅包含单人表情、拍摄场景为实验室环境,这些局限性无法支持学习情感分析的深入研究。为了解决这些问题,文章搭建了北京师范大学学习情感数据库(Beijing Normal University Learning Affect Database,BNU LAD),同时对情感数据的标注方法进行了深入研究,最终形成了包含144位学习者的22708张表情图像、1792组图像序列及243段视频片段的学习情感数据库。该数据库对学习环境下的学习情感分析具有重要意义。展开更多
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the gl...In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method.展开更多
In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amou...In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder,so that the encoder can learn the low-dimensional structural representation of the target domain data.Then,the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain.Finally,a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier.This part of the target domain data is used to guide the source domain data.Experiments on the CASIA,SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2%and 72.4%on the DNN.Compared to the training results of the complete original corpus,it only decreased by 2%in the CASIA corpus,and only 3.4%in the SoutheastLab corpus.Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.展开更多
文摘表情图像是基于表情识别技术分析学习情感的基础,然而现有表情库样本数量有限、表情图像中仅包含单人表情、拍摄场景为实验室环境,这些局限性无法支持学习情感分析的深入研究。为了解决这些问题,文章搭建了北京师范大学学习情感数据库(Beijing Normal University Learning Affect Database,BNU LAD),同时对情感数据的标注方法进行了深入研究,最终形成了包含144位学习者的22708张表情图像、1792组图像序列及243段视频片段的学习情感数据库。该数据库对学习环境下的学习情感分析具有重要意义。
基金The National Natural Science Foundation of China(No.61231002,61273266)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)
文摘In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method.
基金The National Natural Science Foundation of China(No.61871213,61673108,61571106)Six Talent Peaks Project in Jiangsu Province(No.2016-DZXX-023)
文摘In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder,so that the encoder can learn the low-dimensional structural representation of the target domain data.Then,the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain.Finally,a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier.This part of the target domain data is used to guide the source domain data.Experiments on the CASIA,SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2%and 72.4%on the DNN.Compared to the training results of the complete original corpus,it only decreased by 2%in the CASIA corpus,and only 3.4%in the SoutheastLab corpus.Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.