摘要
在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(i-vector);最后将上述研究结合构建了基于i-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。
Under the condition of low resource, little labeled training data is available and the performance of speech recogni- tion system is not ideal. To solve this problem. First, this paper investigated long short term memory recurrent neural network ( LSTM RNN) for acoustic modeling. It was a powerful tool to model long time series and could make full use of the context in- formation. Linear projection layer reduced the number of model parameters. Then, it explored speaker modeling methods in the feature space, and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously. Finally, it presented a novel system, which combined the LSTM RNN model and i-vector feature. Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN base-line system.
出处
《计算机应用研究》
CSCD
北大核心
2017年第2期392-396,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61273268
61370034
61403224)