期刊文献+

低资源条件下基于i-vector特征的LSTM递归神经网络语音识别系统 被引量:21

Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition
下载PDF
导出
摘要 在低资源条件下,由于带标注训练数据较少,搭建的语音识别系统性能往往不甚理想。针对此问题,首先在声学模型上研究了长短时记忆(LSTM)递归神经网络,通过对长序列进行建模来充分挖掘上下文信息,并且引入线性投影层减小模型参数;然后研究了在特征空间中对说话人进行建模的技术,提取出能有效反映说话人和信道信息的身份认证矢量(i-vector);最后将上述研究结合构建了基于i-vector特征的LSTM递归神经网络系统。在Open KWS 2013标准数据集上进行实验,结果表明该技术相比于深度神经网络基线系统有相对10%的字节错误率降低。 Under the condition of low resource, little labeled training data is available and the performance of speech recogni- tion system is not ideal. To solve this problem. First, this paper investigated long short term memory recurrent neural network ( LSTM RNN) for acoustic modeling. It was a powerful tool to model long time series and could make full use of the context in- formation. Linear projection layer reduced the number of model parameters. Then, it explored speaker modeling methods in the feature space, and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously. Finally, it presented a novel system, which combined the LSTM RNN model and i-vector feature. Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN base-line system.
出处 《计算机应用研究》 CSCD 北大核心 2017年第2期392-396,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61273268 61370034 61403224)
关键词 语音识别 长短时记忆神经网络 身份认证矢量 speech recognition long short term memory(LSTM) i-vector
  • 相关文献

参考文献1

二级参考文献14

  • 1Kinnunen T, Li H Z. An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 2010, 52(1): 12-40.
  • 2Dehak N, Kenny P, Ouellet P, Dumouchel P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(4): 788-798.
  • 3Campbell W M, Campbell J P, Reynolds D A, Singer E, Torres-Carrasquillo P A. Support vector machines for speaker and language recognition. Computer Speech and Language, 2006, 20(2-3): 210-229.
  • 4Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1448-1460.
  • 5Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1435-1447.
  • 6Reynolds D A, Quatieri T F, Dunn R B. Speaker verifica- tion using adapted Gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41.
  • 7Cortes C, Vapnik V. Support vector networks. Machine Learning, 1995, 20(3): 273-297.
  • 8Kenny P, Boulianne G, Dumouchel P. Eigenvoice model- ing with sparse training data. IEEE Transactions on Audio, Speech, and Language Processing, 2005, 13(3): 345-354.
  • 9Bishop C M. Pattern Recognition and Machine Learning. Berlin: Springer, 2008.
  • 10Hatch A O, Kajarekar S, Stolcke A. Within-class covari- ance normalization for SVM-based speaker recognition. In: Proceedings of the International Conference on Spoken Lan- guage Processing. Pittsburgh, PA, 2006. 1471-1474.

共引文献16

同被引文献188

引证文献21

二级引证文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部