期刊文献+

嵌入马尔可夫网络的多尺度判决融合耳语音情感识别 被引量:4

Whispered Speech Emotion Recognition Embedded with Markov Networks and Multi-Scale Decision Fusion
下载PDF
导出
摘要 本文中我们提出了一种将高斯混合模型同马尔可夫网络结合的时域多尺度语音情感识别框架,并将其应用在耳语音情感识别中。针对连续语音信号的特点,分别在耳语音信号的短句尺度上和长句尺度上进行了基于高斯混合模型的情感识别。根据情绪的维度空间论,耳语音信号中的情感信息具有时间上的连续性,因此利用三阶的马尔可夫网络对多尺度的耳语音情感分析进行了上下文的情感依赖关系的建模。采用了一种弹簧模型来定义二维情感维度空间中的高阶形变,并且利用模糊熵评价将高斯混合模型的似然度转化为马尔可夫网络中的一阶能量。实验结果显示,本文提出的情感识别算法在连续耳语音数据上获得了较好的识别结果,对愤怒的识别率达到了64.3%。实验结果进一步显示,与正常音的研究结论不同,耳语音中的喜悦情感的识别相对困难,而愤怒与悲伤之间的区分度较高,与Cirillo等人进行的人耳听辨研究结果一致。 In this paper we proposed a multi-scale framework in the time domain to combine the Gaussian Mixture Model and the Markov Network, and apply which to the whispered speech emotion recognition. Based on Gaussian Mixture Model, speech emotion recognition on the long and short utterances are carried out in continuous speech signals. According to the emotion dimensional model, whispered speech emotion should be continuous in the time domain. Therefore we model the context dependency in whispered speech using Markov Network. A spring model is adopted to model the high-order variance in the emotion dimensional space and fuzzy entropy is used for calculating the unary energy in the Markov Network. Experimental results show that the recognition rate of anger emotion reaches 64.3%. Compared with the normal speech the recognition of happiness is more difficult in whispered speech, while anger and sadness is relatively easy to classify. This conclusion is supported by the listening experiment carried out by Cirillo and Todt.
出处 《信号处理》 CSCD 北大核心 2013年第1期98-106,共9页 Journal of Signal Processing
基金 国家自然科学基金(No:61231002 No:61273266 No:51075068) 教育部博士点基金(No.20110092130004) 江苏省高校自然科学研究基金(No.10KJB510005)
关键词 语音情感识别 多尺度分析 马尔可夫网络 判决融合 speech emotion recognition multi-scale analysis Markov networks decision fusion
  • 相关文献

参考文献25

  • 1R. W. Morris. Enhancement and recognition of whispered speech[ D]. Georgia Institute of Technology, USA, 2002.
  • 2金赟,赵艳,黄程韦,赵力.耳语音情感数据库的设计与建立[J].声学技术,2010,29(1):63-68. 被引量:8
  • 3杨莉莉,李燕,徐柏龄.汉语耳语音库的建立与听觉实验研究[J].南京大学学报(自然科学版),2005,41(3):311-317. 被引量:13
  • 4Chenghui Gong, Heming Zhao, Zhi Tao, Zongyue Yan and Xiaojiang Gu. Feature analysis on emotional Chinese whispered speech [C]. International Conference on Information, Networking and Automation, 2010. v2-137-v2-141.
  • 5Chenghui Gong, Heming Zhao, Yanlei Wang, Min Wang and Zongyue Yan. Development of Chinese whispered database for speaker verification [ C ]. Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics, 2009 : 197-200.
  • 6R.L.特拉斯克.《语音学和音系学字典》[M].北京:语文出版社,2000.26.
  • 7H. Huhsch, D. Todt and K. Ziiblke, Einsatz und soziale Interpretation gefliisterter Signale, Umweh und Verhahen [M]. Toronto: K. Pawlik and K. H. Stapf. 1992: 391- 406.
  • 8V. C. Tartter and D. Braun, Hearing smiles and frowns in normal and whisper registers [ J ]. Journal of Acoustic Society of America, 1994, 96(4) :2101-2107.
  • 9Jasmin Cirillo and Dietmar Todt. Decoding whispered vocalizations: Relationships between social and emotional variables [ C ]. Proceedings of the 9th International Conference on Neural Information Processing, 2002: 1559-1563.
  • 10F. Burkhardt, A. Paeschke, M. Rolfes, et al. , A Database of German Emotional Speech [C]. Proceedings of the 9th European Conference on Speech Communication and Technology, 2005 : 1517-1520.

二级参考文献49

共引文献60

同被引文献18

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部