摘要
针对语音信号认知中需要对语音情感快速精准的解析问题,提出了一种基于卷积神经网络(CNN)学习的特征降维方法。在原始语音情感数据提取大量特征的基础上,通过对不同维度特征进行归正获得其相应的特征矩阵。应用CNN对特征矩阵进行学习,对收敛后的CNN网络全连接层的权值进行分析,根据网络学习特性定义基于CNN的特征筛选准则(FR-CNN),即通过对比每类特征激活权值的不同,计算选择出最有利于分类的特征,得到降维高效的语音情感认知特征集F。在中国科学院自动化研究所提供的多模态情感数据库CHEAVD上,提取全部8类情感数据进行了实验测试,使用全体特征集构建的CNN分类器的类平均识别错误率相比基线减少了2.1%,而本文方法得到的降维后特征集F通过相同的CNN分类器的类平均错误率相比基线减少了9.4%。在对大量特征进行降维筛选的基础上,仅使用原特征集15%的特征,不仅有效增加了分类器的收敛速度,还使得识别错误率有所减小,同时在构筑实际语音情感识别系统时能够减少系统的复杂程度。本研究综合了数据的不同类型的特征信息,采用CNN网络学习特性进行特征二次优选与降维,为语音情感的特征提取问题提供了一个新的思路。
A feature reduction method based on convolution neural network( CNN) is proposed to solve the problem of speech emotion recognition. On the basis of extracting a large number of features of the original speech emotion data,the corresponding feature matrix is obtained by normalizing the different dimension features. The CNN is used to study the feature matrix,and the weights of the CNN network are analyzed. According to the characteristics of the network learning feature,that is,by comparing the activation weights of each class,the features that are most favorable for classification are selected by calculation,so the feature selection criterion FR-CNN is obtained. The multi-modal emotional database CHEAVD provided by the Institute of Automation of Chinese Academy of Sciences is used to test all the eight kinds of emotional data,showing that the average recognition error rate of the CNN classifier constructed with all the feature sets is reduced by 2. 1% compared to the baseline results,while the average recognition error rate of the same CNN classifier constructed with dimension reduction F feature set is reduced by9. 4%. In addition,using only 15% of original feature set's features on the basis of dimensional reduction of a large number of features,can not only effectively increase the convergence speed of the classifier,but also make the recognition error rate reduced,at the same time in the actual speech emotion recognition system,the complexity of system can also be reduced. The study provides a new idea for the feature extraction of speech emotion.
出处
《高技术通讯》
北大核心
2017年第11期889-898,共10页
Chinese High Technology Letters
基金
国家自然科学基金(61671187)
深圳市基础研究(JCYJ20150929143955341
JCYJ20150625142543470)
语言语音教育部-微软重点实验室开放基金(HIT.KLOF.2015OXX
HIT.KLOF.2016OXX)资助项目
关键词
模式识别
语音情感
卷积神经网络(CNN)
特征优选准则
特征降维
pattern recognition, speech emotion, eonvolutional neural network (CNN) , feature selection cri-terion, feature reduction