摘要
为解决小样本下语音情感识别准确度低的问题,提出一种生成对抗网络模型下的小样本语音情感识别方法。使用生成器和判别器对抗训练学习样本特征,利用生成器生成高质量的模拟样本以扩充数据集;迁移判别器参数到情感识别网络,加快网络训练速度;连接长短时记忆网络(LSTM),进一步提取时序情感特征,提高情感识别率;选用德语语音库(EMODB)的535条语音样本进行训练、测试,其结果表明,与传统的语音情感识别方法、卷积神经网络(CNN)、CNN-LSTM相比,该方法将语音情感识别率提高了4.54%-25.31%,验证了该方法的有效性。
To solve the problem of low speech emotion recognition rate under small samples,a method of small sample speech emotion recognition based on generative adversarial networks was proposed.Generators and discriminators were used to learn sample features in combat and generators were used to generate high quality simulation samples to augment the data set.The discriminator model parameters were migrated to speed up the model training.The long short-term memory neural network(LSTM)was connected to further extract temporal emotion features and improve the emotion recognition rate.The 535 speech samples of the German speech library(EMODB)were used for training and testing.Results show that the proposed method is compared with traditional speech emotion recognition method,convolutional neural network(CNN),and CNN-LSTM,its emotion recognition rate is increased by 4.54%-25.31%,which verifies the effectiveness of the proposed method.
作者
高英宁
崔艳荣
孙存威
GAO Ying-ning;CUI Yan-rong;SUN Cun-wei(School of Computer Science,Yangtze River University,Jingzhou 434023,China;School of Computer Science and Engineering,University of Electronic Science and Technology,Chengdu 611731,China)
出处
《计算机工程与设计》
北大核心
2020年第12期3550-3556,共7页
Computer Engineering and Design
关键词
生成对抗网络
语音情感识别
小样本
数据增强
长短时记忆网络
迁移学习
generative adversarial networks
speech emotion recognition
small sample
data augmentation
long short-term memory networks
transfer learning