摘要
针对复杂环境下远距离智能语音识别的问题,提出了一种基于深度神经网络(DNN)的波束形成与声学模型联合训练的改进方法。即首先提取麦克风阵列信号之间的多通道互相关系数(MCCC)来估计频域波束形成器权重,进而对阵列信号进行滤波得到增强信号,然后对增强信号提取梅尔滤波器组(Fbank)特征送入声学模型进行训练识别,最后再将识别信息反馈回波束形成网络(BFDNN)来更新网络参数。实验通过Theano与Kaldi工具箱结合搭建大词汇量远距离语音识别系统进行。仿真结果表明了该方法的有效性。
For the speech recognition in far field scenes, an improved method is introduced which trains jointly beamforming basedon Deep Neural Networks(DNN) and acoustic model. Specifically, the parameters of a frequency-domain beamformer are first esti-mated by multichannel cross-correlation coefficient(MCCC) extracted from the microphone channels, and then the array signals fil-tered by the parameters to form an enhanced signal, Mel Filter Bank(Fbank) features are thus extracted from this signal and passedto acoustic model for training and recognition. Finally the output information of beamforming DNN(BFDNN) is used to update thewhole network parameters. A far-field large vocabulary speech recognizer is proposed to implement by Theano coupled with theKaldi toolkit. The simulation results show that the proposed system performance has improved.
出处
《电脑知识与技术》
2018年第5X期182-185,191,共5页
Computer Knowledge and Technology
基金
辽宁省科学事业公益研究基金项目(No.20170056)
辽宁省自然科学基金(No.201302022)