期刊文献+

基于残差网络改进的中文语音情感识别 被引量:3

Improved Chinese speech emotion recognition network based on residual network
下载PDF
导出
摘要 为解决小样本中文语音情感识别准确度低的问题,提出一种基于残差网络改进的中文语音情感识别网络结构AResnet。使用时域增强和频域增强生成更复杂的模拟样本扩充语音情感数据,将注意力机制引入至残差网络(residual networks)中,关注谱图中情感特征分布,提升情感识别率。在CASIA中文语音数据集上训练、测试,其结果显示,对比DCNN+LSTM、Trumpt-6网络结构,识别率分别提升约14.9%、3%,验证了AResnet在中文语音情感识别中的有效性。该方法也在英语语音数据集eNTERFACE’05上进行实验,识别准确率为92%,验证了AResnet有较好的泛化能力。 To solve the problem of low accuracy of Chinese speech emotion recognition with small sample,the improved Chinese speech emotion recognition network structure based on residual networks AResnet was presented.The speech emotion data were augmented with more complex simulated samples using time domain augmentation and frequency domain augmentation,and the attention mechanism was introduced into residual networks to focus on the distribution of emotion features in the spectrogram and improve the emotion recognition rate.The CASIA Chinese speech dataset was used for training and testing.Results show that compared with DCNN+LSTM and Trumpt-6 network structures,the emotion recognition rates of the proposed method increase by 14.9%and 3%respectively,which verifies the effectiveness of AResnet in Chinese speech emotion recognition.The method was also experimented on the English speech dataset eNTERFACE’05.Results show that the recognition accuracy is 92%.The proposed AResnet has good generalization ability.
作者 贾婧雯 蔡英 尔古打机 JIA Jing-wen;CAI Ying;ERGU Daji(College of Electronic and Information,Southwest Minzu University,Chengdu 610000,China)
出处 《计算机工程与设计》 北大核心 2023年第3期922-928,共7页 Computer Engineering and Design
基金 西南民族大学研究生创新研究基金项目(CX2021SZ38)。
关键词 语音情感识别 深度学习 残差网络 注意力机制 小样本 数据增强 语谱图 speech emotion recognition deep learning residual network attention mechanism small sample data enhancement spectrogram
  • 相关文献

参考文献9

二级参考文献55

  • 1詹永照,曹鹏.语音情感特征提取和识别的研究与实现[J].江苏大学学报(自然科学版),2005,26(1):72-75. 被引量:16
  • 2蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J].清华大学学报(自然科学版),2006,46(1):86-89. 被引量:38
  • 3叶庆云,蒋佳.基于语音MFCC特征的改进算法[J].武汉理工大学学报,2007,29(5):150-152. 被引量:9
  • 4余伶俐,蔡自兴,陈明义.语音信号的情感特征分析与识别研究综述[J].电路与系统学报,2007,12(4):76-84. 被引量:27
  • 5Minsk M L. The society of mind. New York: Touchstone, 1985:85-86.
  • 6Picard R W. Affeetive computing. London: MIT Press, 1997:192-195.
  • 7AIST. Successful development of a robot with appearance and performance similar to human [EB/OL]. (2009-05-13)[2014-02-12]. http://www.aist. go .jp/aist_e/latest research/2009/20090513/200905 13.html.
  • 8Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task // 10th International Conference on Speech and Computer: Proceedings of the SPECOM-2005. Patras, 2005:191-194.
  • 9李桂春,郑能恒,李泰.基于模糊隶属值加权的MFCC特征提取算法//第七届和谐人机环境联合学术会议(HHME2011)论文集.北京,2011:40-46.
  • 10Tyagi V, Wellekens C. On desensitizing the Mel- cepstrum to spurious spectral components for robust speech recognition // ICASSP'05. Vancouver, 2005: 529-532.

共引文献102

同被引文献35

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部