期刊文献+

基于多特征融合的藏语语音情感识别 被引量:1

Tibetan speech emotion recognition based on multi⁃feature fusion
下载PDF
导出
摘要 藏语语音情感识别是语音情感识别在少数民族语音处理上的应用,语音情感识别是人机交互的重要研究方向,提取最能表征语音情感的特征并构建具有较强鲁棒性和泛化性的声学模型是语音情感识别的重要研究内容。基于此,为了构建具有高效性和针对性的藏语语音情感识别模型,文中构建了一种藏语语音情感数据集(TBSEC001),并提出一种适合于藏语的手工语音情感特征集(TPEFS),该特征集是在藏语与其他语言的共性和特性的基础上手工提取得到的,TPEFS特征集在支持向量机(SVM)、多层感知机(MLP)、卷积神经网络(CNN)、长短时记忆网络(LSTM)这些经典网络中都取得了不错的效果。所提出的方法在藏语语音数据集(TBSEC001)上取得了88.4%的识别结果,以及在EMODB、RAVDESS、CASIA数据库上分别取得了84.1%、74.3%以及82.5%的识别结果。实验结果表明,该特征集在保证识别率的情况下,对藏语语音情感识别具有一定针对性。 Tibetan speech emotion recognition(SER)is the application of SER in minority speech processing.SER is an important research direction of human⁃computer interaction.It is an important research content of SER to extract the features that can characterize speech emotion best and build an acoustic model with strong robustness and generalization.Therefore,a Tibetan speech emotion dataset TBSEC001 is constructed and a manual speech emotion feature set TPEFS suitable for Tibetan is proposed to construct an efficient and targeted Tibetan SER model.The feature set is manually extracted on the basis of the commonalities and characteristics between Tibetan and other languages.The feature set TPEFS has achieved good results in classical networks,such as support vector machine(SVM),multilayer perceptron(MLP),convolutional neural network(CNN)and long short⁃term memory network(LSTM).The proposed method achieves 88.4%recognition results on Tibetan speech dataset TBSEC001,and 84.1%,74.3%and 82.5%recognition rate on databases EMODB,RAVDESS and CASIA,respectively.The experimental results show that the feature set has certain pertinence for Tibetan SER under the condition of ensuring recognition rate.
作者 谷泽月 边巴旺堆 祁晋东 GU Zeyue;BIANBA Wangdui;QI Jindong(School of Information Science and Technology,Tibet University,Lhasa 850000,China;National Experimental Teaching Demonstration Center of Information Technology,Lhasa 850000,China)
出处 《现代电子技术》 2023年第21期129-133,共5页 Modern Electronics Technique
基金 西藏自治区高原通信科研创新团队项目(XZZZQ2018003) 西藏大学研究生高水平人才培养计划项目(2021⁃GSP⁃S121)。
关键词 语音情感识别 特征提取 深度学习 深度特征 声音质量 多模态情感识别 SER feature extraction deep learning deep feature sound quality multi⁃modal emotion recognition
  • 相关文献

参考文献4

二级参考文献131

  • 1韩文静,李海峰.基于韵律语段的语音情感识别方法研究[J].清华大学学报(自然科学版),2009(S1):1363-1368. 被引量:8
  • 2蔡艳玲.英语语音语调特点及其功能[J].郑州轻工业学院学报(社会科学版),2004,5(3):64-66. 被引量:8
  • 3李永兰.中西方语法学比较论考[J].商丘职业技术学院学报,2005,4(3):35-36. 被引量:1
  • 4van Bezooijen R,Otto SA,Heenan TA. Recognition of vocal expressions of emotion:A three-nation study to identify universal characteristics[J].{H}JOURNAL OF CROSS-CULTURAL PSYCHOLOGY,1983,(04):387-406.
  • 5Tolkmitt FJ,Scherer KR. Effect of experimentally induced stress on vocal parameters[J].Journal of Experimental Psychology Human Perception Performance,1986,(03):302-313.
  • 6Cahn JE. The generation of affect in synthesized speech[J].Journal of the American Voice Input/Output Society,1990.1-19.
  • 7Moriyama T,Ozawa S. Emotion recognition and synthesis system on speech[A].Florence:IEEE Computer Society,1999.840-844.
  • 8Cowie R,Douglas-Cowie E,Savvidou S,McMahon E,Sawey M,Schro. Feeltrace:An instrument for recording perceived emotion in real time[A].Belfast:ISCA,2000.19-24.
  • 9Grimm M,Kroschel K. Evaluation of natural emotions using self assessment manikins[A].Cancun,2005.381-385.
  • 10Grimm M,Kroschel K,Narayanan S. Support vector regression for automatic recognition of spontaneous emotions in speech[A].IEEE Computer Society,2007.1085-1088.

共引文献185

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部