摘要
针对现有的说话人识别方法对环境噪声较为敏感的问题,提出了一种与文本无关的单训练样本说话人识别方法。该方法能够提取语音时频谱的局部特征,此特征不仅对白噪声、高斯噪声、粉红噪声有很强的鲁棒性,而且能够充分反映说话人的基本发声特性。针对该局部特征的基本特点,给出了适合该局部特征的贝叶斯决策方法。对英文与汉语语音数据库的仿真实验表明,该识别方法可以实现单训练样本下的说话人识别,识别精度明显高于现有的Mel频率倒谱系数(MFCC)与线性预测编码(LPCC)语音特征,而且对白噪声等各类环境噪声有较高的鲁棒性。
In order to alleviate the limitation that the existing speaker identification methods are sensitive to noisy and environmental sounds.A novel robust text-independent speaker identification approach using single training sample is proposed.In such method,the main frequency components of an acoustic signal are determined in time-frequency domain,and then their local distributions and variations in time-frequency domain are obtained and regarded as the acoustic local features.These local features are not only robust to white noise and pink noise,and invariant to the intensity of the acoustic signal,but also reflect a person′s inherent phonation characteristic.A Bayesian decision classifier for these acoustic local features have been introduced.Experimental results on speech databases in English and Chinese demonstrate that the proposed approach can implement speaker identification based on single training sample,and yields a better performance in terms of the correct classification percentages compared with the conventional acoustic features such as linear predictive coding cepstral(LPCC)coefficients and mel-frequency cepstral coefficients(MFCC).It is also shown that the proposed approach yields significantly high tolerances to white noise,pink noise and environmental sounds.
作者
郭建敏
王晅
GUO Jianmin WANG Xuan(School of Physics and Information Technology, Shaanxi Normal University, Xi'an 710119, Shaanxi, China)
出处
《陕西师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2016年第5期33-38,共6页
Journal of Shaanxi Normal University:Natural Science Edition
基金
国家自然科学基金(61373083)
陕西省自然科学基金(2009JM8003)
关键词
说话人识别
时频局部特征
线性预测编码
MEL频率倒谱系数
贝叶斯决策
speaker recognition
time-frequency local features
linear predictive coding cepstral
Mel-frequency cepstral coefficients
Bayesian decision