期刊文献+

基于改进BFDNN的远距离语音识别方法 被引量:1

Far-field Speech Recognition Method based on Improved BFDNN
下载PDF
导出
摘要 针对复杂环境下远距离智能语音识别的问题,提出了一种基于深度神经网络(DNN)的波束形成与声学模型联合训练的改进方法。即首先提取麦克风阵列信号之间的多通道互相关系数(MCCC)来估计频域波束形成器权重,进而对阵列信号进行滤波得到增强信号,然后对增强信号提取梅尔滤波器组(Fbank)特征送入声学模型进行训练识别,最后再将识别信息反馈回波束形成网络(BFDNN)来更新网络参数。实验通过Theano与Kaldi工具箱结合搭建大词汇量远距离语音识别系统进行。仿真结果表明了该方法的有效性。 For the speech recognition in far field scenes, an improved method is introduced which trains jointly beamforming basedon Deep Neural Networks(DNN) and acoustic model. Specifically, the parameters of a frequency-domain beamformer are first esti-mated by multichannel cross-correlation coefficient(MCCC) extracted from the microphone channels, and then the array signals fil-tered by the parameters to form an enhanced signal, Mel Filter Bank(Fbank) features are thus extracted from this signal and passedto acoustic model for training and recognition. Finally the output information of beamforming DNN(BFDNN) is used to update thewhole network parameters. A far-field large vocabulary speech recognizer is proposed to implement by Theano coupled with theKaldi toolkit. The simulation results show that the proposed system performance has improved.
出处 《电脑知识与技术》 2018年第5X期182-185,191,共5页 Computer Knowledge and Technology
基金 辽宁省科学事业公益研究基金项目(No.20170056) 辽宁省自然科学基金(No.201302022)
关键词 远距离语音识别 波束形成 麦克风阵列 MCCC BFDNN far-field speech recognizer beamforming microphone channels MCCC BFDNN
  • 相关文献

参考文献2

二级参考文献16

  • 1MATTHIAS W(o)lfel,MCDONOUGH John. Distant Speech Recognition[M].Germany:John Wiley & Sons Ltd,2009.
  • 2GOMEZ R. Robust Speech Recognition Based on Dereverberation Parameter Optimization Using Acoustic Model Likelihood[J].IEEE Transactions on Audio Speech and Language Processing,2010,(07):1708-1716.
  • 3NOBUTAKA Ito,HIKARU Shimizu. Diffuse Noise Suppression sing Crystal-Shaped Microphone Arrays[J].IEEE Transactions on Audio Speech and Language Processing,2011,(07):2101-2110.
  • 4SEHR A,HOFMANN C,MAAS R. Multi-style training of hmms with stereo data for reverberation-robust speech recognition[A].Germany:IEEE Press,2011.196-200.
  • 5SEHR A,MAAS R,KELLERMANN W. Reverberation model based decoding in the logmelspec domain for robust distant-talking speech recognition[J].IEEE Transactions on Audio Speech and Language Processing,2010,(07):1676-1691.
  • 6SEHR A,MAAS R,KELLERMANN W. Frame-wise hmm adaptation using state-dependent reverberation estimates[A].Germany:IEEE Press,2011.5484-5487.
  • 7ALAVINIA S. Single channel speech/music segregation based on a novel K-means clustering schema[A].Iran:IEEE Press,2011.567-572.
  • 8BUERA L. Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition[J].IEEE Transactions on Audio Speech and Language Processing,2010,(02):296-309.
  • 9YOUNG Steve,EVERMANN Gunnar,GALES Mark. The HTK Book (for HTK Version 3.4)[M].UK:Cambridge University Engineering Department,2009.
  • 10KIM D,GALES M. Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition[J].IEEE Transactions on Audio Speech and Language Processing,2011,(02):315-325.

共引文献26

同被引文献18

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部