期刊文献+

基于卷积神经网络学习的语音情感特征降维方法研究 被引量:4

Research on a dimension reduction method of speech emotional feature based on convolution neural network
下载PDF
导出
摘要 针对语音信号认知中需要对语音情感快速精准的解析问题,提出了一种基于卷积神经网络(CNN)学习的特征降维方法。在原始语音情感数据提取大量特征的基础上,通过对不同维度特征进行归正获得其相应的特征矩阵。应用CNN对特征矩阵进行学习,对收敛后的CNN网络全连接层的权值进行分析,根据网络学习特性定义基于CNN的特征筛选准则(FR-CNN),即通过对比每类特征激活权值的不同,计算选择出最有利于分类的特征,得到降维高效的语音情感认知特征集F。在中国科学院自动化研究所提供的多模态情感数据库CHEAVD上,提取全部8类情感数据进行了实验测试,使用全体特征集构建的CNN分类器的类平均识别错误率相比基线减少了2.1%,而本文方法得到的降维后特征集F通过相同的CNN分类器的类平均错误率相比基线减少了9.4%。在对大量特征进行降维筛选的基础上,仅使用原特征集15%的特征,不仅有效增加了分类器的收敛速度,还使得识别错误率有所减小,同时在构筑实际语音情感识别系统时能够减少系统的复杂程度。本研究综合了数据的不同类型的特征信息,采用CNN网络学习特性进行特征二次优选与降维,为语音情感的特征提取问题提供了一个新的思路。 A feature reduction method based on convolution neural network( CNN) is proposed to solve the problem of speech emotion recognition. On the basis of extracting a large number of features of the original speech emotion data,the corresponding feature matrix is obtained by normalizing the different dimension features. The CNN is used to study the feature matrix,and the weights of the CNN network are analyzed. According to the characteristics of the network learning feature,that is,by comparing the activation weights of each class,the features that are most favorable for classification are selected by calculation,so the feature selection criterion FR-CNN is obtained. The multi-modal emotional database CHEAVD provided by the Institute of Automation of Chinese Academy of Sciences is used to test all the eight kinds of emotional data,showing that the average recognition error rate of the CNN classifier constructed with all the feature sets is reduced by 2. 1% compared to the baseline results,while the average recognition error rate of the same CNN classifier constructed with dimension reduction F feature set is reduced by9. 4%. In addition,using only 15% of original feature set's features on the basis of dimensional reduction of a large number of features,can not only effectively increase the convergence speed of the classifier,but also make the recognition error rate reduced,at the same time in the actual speech emotion recognition system,the complexity of system can also be reduced. The study provides a new idea for the feature extraction of speech emotion.
出处 《高技术通讯》 北大核心 2017年第11期889-898,共10页 Chinese High Technology Letters
基金 国家自然科学基金(61671187) 深圳市基础研究(JCYJ20150929143955341 JCYJ20150625142543470) 语言语音教育部-微软重点实验室开放基金(HIT.KLOF.2015OXX HIT.KLOF.2016OXX)资助项目
关键词 模式识别 语音情感 卷积神经网络(CNN) 特征优选准则 特征降维 pattern recognition, speech emotion, eonvolutional neural network (CNN) , feature selection cri-terion, feature reduction
  • 相关文献

参考文献4

二级参考文献29

  • 1蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J].清华大学学报(自然科学版),2006,46(1):86-89. 被引量:38
  • 2王治平,赵力,邹采荣.基于基音参数规整及统计分布模型距离的语音情感识别[J].声学学报,2006,31(1):28-34. 被引量:26
  • 3Tenenbaum J B, de Silva V, Langford J C. A Global Geometric Framework for Nonlinear Dimensionality Reduction[J]. Science, 2000, 290(22): 2319-2323.
  • 4You Mingyu, Chen Chun, Bu Jiajun. Emotional Speech Analysis on Nonlinear Manifold[C] //Proc. of the 18th International Conference on Pattern Recognition. Hong Kong, China:[s. n.] , 2006: 91-94.
  • 5Xie Bo, Wei Xuan, Chen Gencai, et al. Emotional Speech Database and Its Statistical Analysis of Prosodic Features[C] //Proc. of the 1st Affective Computing and Intelligent Interaction Conference. Beijing, China:[s. n.] , 2003: 221-225.
  • 6Wang Xiaojia, Mao Qirong, Zhan Yongzhao. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network[C] //Proc. of International Electronic Conference on Computer Science. Athens, Greece:[s. n.] , 2007: 26-28.
  • 7Cowie R,Cowie E D,Tsapatsoulis N,et al.Emotion recognition in human-computer interaction[J].IEEE Signal Processing Magazine,2001,18(1):32-80.
  • 8Paeschke A,Sendlmeier W F.Prosodic characteristics of emotional speech:measurements of fundamental frequency movements[A].Proc of ISCA Workshop on Speech and Emotion[C].Northern Ireland:Textflow,2000.75-80.
  • 9Schuller B,Rigoll G,Lang M.Hidden markov model-based speech emotion recognition[A].Proc of ICASSP'03[C].New York:IEEE Press,2003.II,1-4.
  • 10Cheveign A D,Kawahara H.YIN:A fundamental frequency estimator for speech and music[J].J Acoust Soc Am,2002,111(4):1917-1930.

共引文献57

同被引文献45

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部