摘要
传统的语音端点检测方法对辅音,特别是受到噪声污染的清音部分与背景噪声之间分离能力不足。针对上述问题,该文提出一种基于Fisher线性判别分析的梅尔频率倒谱系数(F-MFCC)端点检测方法。将清音信号和背景噪声视为两类分类问题,采用Fisher准则求解具有判别信息的最佳投影方向,使得投影后的特征参数具有最小类内散度和最大类间散度,从而增大清音与背景噪声的可分离性。在不同语音库上的实验结果表明,F-MFCC能够在不同信噪比和背景噪声条件下提高语音端点检测的准确率。
Traditional Voice Activity Detection (VAD) approaches can not effectively detect consonant as well as noisy unvoiced consonant. To address this problem, this paper proposes a VAD approach Mel Frequency Cepstrum Coefficient (F-MFCC) based on Fisher linear discriminant analysis, in consideration of two-class issue regarding to consonant and background noise. Fisher criterion rule is used to solve the optimal projection vector, building upon which we can minimize the within-class scatter can be minimized and the between-class scatter can be maximized, as a result to enhance separability between consonant and background noise. Extensive experiments are conducted to evaluate the F-MFCC performance. The results demonstrate that, under different SNR and noise conditions, the proposed approach achieves higher VAD accuracy.
出处
《电子与信息学报》
EI
CSCD
北大核心
2015年第6期1343-1349,共7页
Journal of Electronics & Information Technology
关键词
语音处理
语音端点检测
梅尔频率倒谱系数
FISHER线性判别分析
Speech processing
Voice Activity Detection Fisher linear discriminant analysis (VAD)
Mel Frequency Cepstrum Coefficient (MFCC)