摘要
为进一步提高汉语语音情感识别率,基于深度学习中的自编码、降噪自编码及稀疏自编码的网络结构,提出了一种改进的栈式自编码结构.该结构第1层使用降噪自编码学习一个比输入特征维数更大的隐藏特征,第2层采用稀疏自编码学习稀疏性特征,最后使用softmax分类器进行分类识别.训练过程首先采用逐层预训练的方法,达到网络参数全面初始化的目的,然后对整个网络进行微调.在中文语音库上的情感识别实验显示,相较于单独使用栈式降噪或稀疏自编码,所提结构具有更好的识别效果.此外,基于CASIA库的对比实验显示,该结构比K近邻算法、稀疏表示方法、传统支持向量机和人工神经网络识别率分别提高了53.7%,29.8%,14.3%和1.9%.在自行录制的语音库中,该结构的识别率比人工神经网络提高了1.64%.
An improved stacked autoencoder based on autoencoder,denoising autoencoder and sparse autoencoder is proposed to improve the Chinese speech emotion recognition. The first layer of the structure uses a denoising autoencoder to learn a hidden feature with a larger dimension than the dimension of the input features,and the second layer employs a sparse autoencoder to learn sparse features. Finally,a softmax classifer is applied to classify the features. In the training process,the layer-wise pre-training is used to achieve the purpose of initializing all parameters of the network,and then the whole network is fine-tuned. The experiments on Chinese databases show that the improved stacked autoencoders achieve a better recognition rate than the stacked denoising autoencoders or stacked sparse autoencoders. In addition,the comparative experiments based on CASIA database show that the recognition rate of the structure is improved by 53. 7%,29. 8%,14. 3% and 1. 9%,respectively,compared with the K-nearest neighbor algorithm,the sparse representation method,the traditional support vector machine and the artificial neural network. The recognition rate of this structure is 1. 64% higher than the artificial neural network on the self-recording database.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2017年第4期631-636,共6页
Journal of Southeast University:Natural Science Edition
基金
国家自然科学基金资助项目(61375028
61571106
61673108)
江苏省青蓝工程资助项目
江苏省博士后科研资助计划资助项目(1601011B)
江苏省"六大人才高峰"资助项目(2016-DZXX-023)
中国博士后科学基金资助项目(2016M601695)
关键词
语音情感识别
改进的栈式自编码
降噪自编码
稀疏自编码
speech emotion recognition
enhanced stacked autoencoder
denoising autoencoder
sparse autoencoder