期刊文献+

面向中文语音情感识别的改进栈式自编码结构 被引量:6

Improved stacked autoencoder for Chinese speech emotion recognition
下载PDF
导出
摘要 为进一步提高汉语语音情感识别率,基于深度学习中的自编码、降噪自编码及稀疏自编码的网络结构,提出了一种改进的栈式自编码结构.该结构第1层使用降噪自编码学习一个比输入特征维数更大的隐藏特征,第2层采用稀疏自编码学习稀疏性特征,最后使用softmax分类器进行分类识别.训练过程首先采用逐层预训练的方法,达到网络参数全面初始化的目的,然后对整个网络进行微调.在中文语音库上的情感识别实验显示,相较于单独使用栈式降噪或稀疏自编码,所提结构具有更好的识别效果.此外,基于CASIA库的对比实验显示,该结构比K近邻算法、稀疏表示方法、传统支持向量机和人工神经网络识别率分别提高了53.7%,29.8%,14.3%和1.9%.在自行录制的语音库中,该结构的识别率比人工神经网络提高了1.64%. An improved stacked autoencoder based on autoencoder,denoising autoencoder and sparse autoencoder is proposed to improve the Chinese speech emotion recognition. The first layer of the structure uses a denoising autoencoder to learn a hidden feature with a larger dimension than the dimension of the input features,and the second layer employs a sparse autoencoder to learn sparse features. Finally,a softmax classifer is applied to classify the features. In the training process,the layer-wise pre-training is used to achieve the purpose of initializing all parameters of the network,and then the whole network is fine-tuned. The experiments on Chinese databases show that the improved stacked autoencoders achieve a better recognition rate than the stacked denoising autoencoders or stacked sparse autoencoders. In addition,the comparative experiments based on CASIA database show that the recognition rate of the structure is improved by 53. 7%,29. 8%,14. 3% and 1. 9%,respectively,compared with the K-nearest neighbor algorithm,the sparse representation method,the traditional support vector machine and the artificial neural network. The recognition rate of this structure is 1. 64% higher than the artificial neural network on the self-recording database.
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2017年第4期631-636,共6页 Journal of Southeast University:Natural Science Edition
基金 国家自然科学基金资助项目(61375028 61571106 61673108) 江苏省青蓝工程资助项目 江苏省博士后科研资助计划资助项目(1601011B) 江苏省"六大人才高峰"资助项目(2016-DZXX-023) 中国博士后科学基金资助项目(2016M601695)
关键词 语音情感识别 改进的栈式自编码 降噪自编码 稀疏自编码 speech emotion recognition enhanced stacked autoencoder denoising autoencoder sparse autoencoder
  • 相关文献

参考文献3

二级参考文献37

  • 1Cowie R,Cowie E D,Tsapatsoulis N,et al.Emotion recognition in human-computer interaction[J].IEEE Signal Processing Magazine,2001,18(1):32-80.
  • 2Paeschke A,Sendlmeier W F.Prosodic characteristics of emotional speech:measurements of fundamental frequency movements[A].Proc of ISCA Workshop on Speech and Emotion[C].Northern Ireland:Textflow,2000.75-80.
  • 3Schuller B,Rigoll G,Lang M.Hidden markov model-based speech emotion recognition[A].Proc of ICASSP'03[C].New York:IEEE Press,2003.II,1-4.
  • 4Cheveign A D,Kawahara H.YIN:A fundamental frequency estimator for speech and music[J].J Acoust Soc Am,2002,111(4):1917-1930.
  • 5Tzanetakis G,Cook P.Musical genre classification of audio signals[J].IEEE Transactions on Speech and Audio Processing,2002,10(5):293-302.
  • 6Lu L,Zhang H J,Jiang H.Content analysis of audio classification and segmentation[J].IEEE Transactions on Speech and Audio Processing,2002,10(7):504-516.
  • 7Kittler J,Hatef M,Duin R P,et al.On combining classifiers[J].IEEE Transactions on Pattern Analysis and Machine Learning,1998,20(3):226-239.
  • 8Scherer K R. Vocal communication of emotion: a review of research paradigms[J]. Speech Communication,2003,40(1/2):227-256.
  • 9Scherer K R, Mortillaro M, Mehu M. Understanding the mechanisms underlying the production of facial expression of emotion: a componential perspective[J]. Emotion Review,2013,5(1):47-53.
  • 10Lin J C, Wu C H, Wei W L. Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition[J]. IEEE Transactions on Multimedia,2012,14(1):142-156.

共引文献57

同被引文献74

引证文献6

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部