摘要
为了改善发声力度对说话人识别系统性能的影响,在训练语音存在少量耳语、高喊语音数据的前提下,提出了使用最大后验概率(MAP)和约束最大似然线性回归(CMLLR)相结合的方法来更新说话人模型、投影转换说话人特征。其中,MAP自适应方法用于对正常语音训练的说话人模型进行更新,而CMLLR特征空间投影方法则用来投影转换耳语、高喊测试语音的特征,从而改善训练语音与测试语音的失配问题。实验结果显示,采用MAP+CMLLR方法时,说话人识别系统等错误率(EER)明显降低,与基线系统、最大后验概率(MAP)自适应方法、最大似然线性回归(MLLR)模型投影方法和约束最大似然线性回归(CMLLR)特征空间投影方法相比,MAP+CMLLR方法的平均等错率分别降低了75.3%、3.5%、72%和70.9%。实验结果表明,所提出方法削弱了发声力度对说话人区分性的影响,使说话人识别系统对于发声力度变化更加鲁棒。
To improve the performance of recognition system which is influenced by the change of vocal effort, in the premise of a small amount of whisper and shouted speech data in training speech data, Maximum A Posteriori (MAP) and Constraint Maximum Likelihood Linear Regression (CMLLR) were combined to update the speaker model and transform the speaker characteristics. MAP adaption method was used to update the speaker model of normal speech training, and the CMLLR feature space projection method was used to project and transform the features of whisper and shouted testing speech to improve the mismatch between training speech and testing speech. Experimental results show that the Equal Error Rate (EER) of speaker recognition system was significantly reduced by using the proposed method. Compared with the baseline system, MAP adaptation method, Maximum Likelihood Linear Regression (MLLR) model projection method and CMLLR feature space projection method, the average EER is reduced by 75.3%, 3.5%, 72%, 70.9%, respectively. The experimental results prove that the proposed method weakens the influence on discriminative power for vocal effort and makes the speaker recognition system more robust to vocal effort variability.
出处
《计算机应用》
CSCD
北大核心
2017年第3期906-910,共5页
journal of Computer Applications
基金
贵州省社会攻关计划项目(黔科合SY字[2013]3105号)
贵州省工程技术研究中心建设项目(黔科合G字[2014]4002号)~~
关键词
说话人识别
发声力度
最大后验概率
最大似然线性回归
约束最大似然线性回归
speaker recognition
vocal effort
Maximum A Posteriori (MAP)
Maximum Likelihood Linear Regression(MLLR)
Constraint Maximum Likelihood Linear Regression (CMLLR)