摘要
为提高语音清音、浊音和静默帧的分类准确率,提出了一种基于栈自动编码机的语音分类新方法.该方法由栈自动编码机和Softmax分类器组成的深度神经网络实现.首先,提取子带信号强度、残差信号峰值、增益、基音周期和线谱频率作为训练序列无监督训练栈自动编码机;然后,使用栈自动编码机的输出对Softmax分类器进行有监督训练;最后,有监督微调整个网络,得到最终网络参数.实验结果表明,在不同背景噪声及不同信噪比下,文中算法的分类准确率均优于传统算法的,且信噪比越低,性能优势越明显.
In order to improve the accuracy of the voiced/unvoiced/silence classification, a new method based on the Stack Autoencoder (SAE) is proposed. This method is implemented with a deep neural network composed of SAE and Softmax. First, the SAE is trained with the unsupervised method using a speech parameter training sequence which includes the residual signal peak, gains, pitch periods, and line spectrum frequency (LSF), while the Softmax is trained with supervision by the use of the output of the SAE with the speech parameters training sequence as its training input. Then a supervised fine-turning method to the deep neural network is conducted to obtain the final parameters of the networks. Test results have shown that the accuracy of the speech classification of the presented method is better than the traditional methods in different background noise conditions with different signal-to-noise ratios (SNR), especially in the low SNR condition.
作者
马鸿飞
赵月娇
刘珂
刘浩
MA Hongfei ZHAO Yuejiao LIU Ke LIU Hao(State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an 710071, China)
出处
《西安电子科技大学学报》
EI
CAS
CSCD
北大核心
2017年第5期13-17,共5页
Journal of Xidian University
关键词
深度学习
栈自动编码机
语音处理
语音分类
deep learning
stack autoencoder
speech processing
speech classification