摘要
在说话人识别中,美尔倒谱系数MFCC(Mel-Frequency Cepstral Coefficients)是一种常用的特征,但是这种通用的特征在耳语音的说话人识别上并不太理想。MFCC的三角滤波器组在Mel尺度上是均匀分布的,但是耳语音不同于正常音的发声,通过改变这种均匀分布的格局来改善耳语音说话人识别率,将全频域分成不同频段,分别调整各频段内滤波器的疏密程度,再将各频段的滤波器组合成新的滤波器组。修正后的滤波器模型在文本无关的耳语音说话人识别中相比原模型识别效果有所提高。
MFCC (Mel-Frequency Cepstral Coefficients) is a normally used feature in speaker recognition system. But such a common feature does not work well on whispered speech. The original MFCC was a bunch of triangular filter uniformly distributed in Mel dimension. This paper presents a new research method, by changing this uniform dis- tribution to improve ASR recognition rate on whispered voice, since its differently pronounced way other than the nor- mal voice. Experiments were done to analyze the effect of different recognition rate caused by different number of filters added to each single frequency region, and then the result of experiment was used to select a proper number of filters to be added to a specified frequency. Then, the combination of all filters will be the final model. The model is designed as an adaption to the voice source, and it shows a good performance in text-independent speaker recognition.
出处
《苏州大学学报(工科版)》
CAS
2009年第4期59-64,共6页
Journal of Soochow University Engineering Science Edition (Bimonthly)