摘要
提出一种基于多通道语音识别的用于波束形成的闭环掩码迭代估计算法,通过神经网络迭代和语音识别获得理想比例掩码估计和语音活动检测信息。结合输入特性数据和输出得分数据,在CHiME-4六通道语音识别实测数据上该算法优于基于复高斯混合模型算法,字识别错误率指标下降了24.1%。
We propose a closed-loop approach to beamforming by leveraging upon information obtained from iterative neural network based ideal ratio mask estimation and speech recognition based voice activity detection. Testing on the CHIME-4 task of recognizing 6-channel microphone array speech, together with data augmentation for fusion of input features and of output scores, the proposed multi-channel approach significantly outperforms the CGMM-based method, yielded about 24. 1% word error relative reduction.
出处
《信息技术与标准化》
2018年第8期65-69,72,共6页
Information Technology & Standardization