摘要
传统的说话人识别系统在噪声环境下的识别率较低.基于计算听觉场景分析得到的二值掩码可以对噪声占主导部分进行重建,从而将与说话人相关的被破坏的信息重建起来.但是重建的效果受到该帧中可靠帧的比例的影响.因此,根据提取的二值掩码来设定阈值,从而对测试特征的帧进行选取,将测试特征的帧划分为三类,分别用于重建、保留和丢弃.最终使用重建后的帧和保留的帧进行后续处理,并用于识别过程.实验结果表明,相较于原来的重建系统,该算法的识别率有了一定的提高.
Conventional sperker recognition system perform pooly under noisy conditions. The extracted Binary Mask based on Computational auditory scene analysis can reconstruct the noise dominanted part of the speech, so that the information which is related to the speaker and destroyed can be rebuilt. However, the result is affected by the ratio of the reliable of the frame. Therefore, this paper set a threshold based on the extracted binary mask and use the threadshold to select frames. The frame is divided into three respectively, for reconstruction, retain and discard. The reconstructed and the retained frame will be used to identification. Experimental results show that compared to the original reconstruction system, the recognition rate of the algorithm has been improved.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第5期1107-1111,共5页
Journal of Chinese Computer Systems