期刊文献+

基于语谱图提取瓶颈特征的情感识别算法研究 被引量:7

Research on Emotion Recognition Algorithm Based on Spectrogram Feature Extraction of Bottleneck Feature
下载PDF
导出
摘要 传统的谱特征(诸如MFCC)来源于对语谱图特征的再加工提取,但存在着因分帧处理引起相邻帧谱特征之间相关性被忽略的问题和所提取的谱特征与目标标签不相关的问题。这导致了从语谱图中提取的特征丢失了很多有用信息。为此,提出了获取深度谱特征(Deep Spectral Feature,DSF)的算法。DSF的特征是把直接从语谱图中提取的谱特征用于深度置信网络(DBN)训练,进而从隐层节点数较少的瓶颈层提取到瓶颈特征。为了解决传统谱特征的第一种缺陷,采用相邻多帧语音信号中提取的特征参数构成DSF特征。而深度置信网络所具有的强大自学习能力以及与目标标签密切相关的性能,使得经过微调的DSF特征能够解决传统谱特征的第二个缺陷。大量的仿真实验结果表明,相对于传统MFCC特征,经过微调的DSF特征在语音情感识别领域的识别率比传统MFCC高3.97%。 Traditional spectral features ( such as MFCC) can be extracted from spectrogram features. However, the relation between spec- tral features of adjacent frames has been ignored owing to frames division. What' s worse,the extracted spectral features are uncorrelated with the labels of corresponding targets ,which lead to useful feature information lost. Therefore,a new Deep Spectral Feature (DSF) al- gorithm has been proposed,in which DSF features have been gained by applying spectral feature directly extracted from spectrogram for Deep Belief Network (DBN) and a kind of bottleneck (BN) feature from the bottleneck layer has been obtained with least hidden layer nodes number. To deal with the first drawback, a method is proposed to extract characteristic parameters from adjacent frames that consist of DSF features. What is more, owing to strong self-learning ability and substantial relationship with target labels in deep belief network, the proposed DSF feature can supply a better solution to the second drawback of conventional spectral features. Experimental results show that the accuracy of DSF feature with proper fine-tuning outperforms traditional MFCC about 3.97% in speech emotion recognition.
作者 李姗 徐珑婷
出处 《计算机技术与发展》 2017年第5期82-86,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61271335) 国家"863"高技术发展计划项目(2006AA010102)
关键词 瓶颈特征 深度置信网络 谱特征 语谱图 情感识别 bottleneck feature deep belief network spectral feature spectrogram emotion recognition
  • 相关文献

参考文献5

二级参考文献97

  • 1李思一,戴蓓蒨,王海祥.基于子带GMM-UBM的广播语音多语种识别[J].数据采集与处理,2007,22(1):14-18. 被引量:2
  • 2叶世伟,史忠植.神经网络原理[M].北京:机械工业出版社,2006.
  • 3Rabiner L R,Sambur M R.An algorithm for determining the endpoints of isolated utterances[J].The Bell System Technical Journal,1975,54(2):297-315.
  • 4Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[C] //Digital Signal Processing.2000:19-41.
  • 5Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE Signal Processing Letters,2006,13:308-11.
  • 6Bilmes JA.Maximum mutual information based reduction strategies for cross-correlation based joint distribution modeling[C] //IEEE Int.Conf.Acoust.,Speech,Signal Processing (ICASSP).Seattle,USA,May 1998.
  • 7Yang H H,Sharna S,van Vuuren S,et al.Relevance of timefrequency features for phonetic and speaker-channel classification[J].Speech Communication,2000,31 (1):35-50.
  • 8Fousek P,Lamel L,Gauvain J-L.Transcribing Broadcast Data using MLP Features[C] //Proceedings of Interspeech.2008.
  • 9Park J,Diehl F,Gales M,et al.Training and Adapting MLPFeatures for Arabic Speech Recognition[C] //Proc,of IEEE Conf.Acoust.Speech Signal Process(ICASSP).2009.
  • 10Picheny M,Nahamoo D,Goel V,et al.Trends and Advances in Speech Recognition[J].IBM Journal of Research and Development,2011,55(5):2.

共引文献136

同被引文献49

引证文献7

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部