摘要
为进一步提升基于时频掩蔽值的波束形成的性能,提出集成基于神经网络的复值时频掩蔽值估计和基于空域聚类的实值时频掩蔽值估计的波束形成方法,旨在提高声源存在概率估计的准确性。该方法首先提取输入信号的时频特征和空域特征,将时频特征输入到神经网络得到复值时频掩蔽值。利用复值时频掩蔽值中信号的幅度和相位信息,提升存在概率估计的准确性。随后,将神经网络估计的声源存在概率作为空域聚类方法的初始时频掩蔽值,通过期望最大化算法迭代估计时频掩蔽值,从而缓解神经网络方法因数据不匹配带来的性能衰减问题。实验表明,所提集成方法相比基线系统的相对词错误率取得了7.6个百分点的性能提升。
To improve the performance,we propose integrating NN-based complex time-frequency mask estimation and spatial clustering(SC)based real-value time-frequency mask estimation to improve the accuracy of presence probability estimation.In our method,time-frequency features and spatial features are extracted firstly.Then,the signals’estimated complex time-frequency mask is acquired by inputting time-frequency features to the trained neural network.By exploiting the signals’amplitude and phase information in complex time-frequency mask,the accuracy of estimated source presence probability can be improved.Subsequently,we regard estimated source presence probability as initial time-frequency mask of SC-based method.This mask is estimated iteratively through expectation maximization algorithm to reduce the performance degradation caused by data mismatch.Experimental results show that the proposed method achieves 7.6%relative WER reduction compared to baseline system.
作者
郭晓波
屈丹
杨绪魁
刘诚然
GUO Xiaobo;QU Dan;YANG Xukui;LIU Chengran(Information Engineering University, Zhengzhou 450001, China)
出处
《信息工程大学学报》
2021年第4期385-392,共8页
Journal of Information Engineering University
基金
国家自然科学基金资助项目(61673395,62171470)。
关键词
时频掩蔽值
波束形成
集成
复值时频掩蔽值
time-frequency mask
beamforming
integration
complex ideal ratio mask